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ALIGNMENTS 



RESULT 1 
AAE31703 

ID AAE31703 standard; protein; 672 AA. 
XX 

AC AAE31703; 
XX 

DT 24-MAR-2003 (first entry) 
XX 

DE Mouse ABCG8 protein. 
XX 

KW ABC family cholesterol transporter; ABCG8; sterol-related disorder; 

KW sitosterolaemia; hyperlipidaemia; hypercholesterolemia; gall stone; 

KW HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 

KW mouse; ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; 

KW ABCG5. 
XX 



OS Mus sp . 
XX 

FH Key Location/Qualifiers 

FT Misc-dif ference 440 

FT /note= "Encoded by AAG" 
XX 

PN WO200281691-A2. 
XX 

PD 17-OCT-2002. 
XX 

PF 20-NOV-2001; 2001WO-US043823 . 
XX 

PR 20-NOV-2000; 2000US-0252235P . 

PR 28-NOV-2000; 2000US-0253645P. 
XX 

PA (TULA-) TULARIK INC. 

PA (TEXA ) UNIV TEXAS SYSTEM. 

XX 

PI Hobbs HH, Shan B, Barnes R, Tian H; 
XX 

DR WPI; 2003-058548/05. 

DR N-PSDB; AAD48881. 
XX 

PT New ABCG8 polypeptides and nucleic acids, useful for treating sterol- 

PT related disorders e.g. sitosterolemia, hypercholesterolemia, 

PT hyperlipidemia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies. 

XX 

PS Claim 22; Page 76; 94pp; English. 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is mouse ABCG8 protein 
XX 

SQ Sequence 672 AA; 

Query Match 99.8%; Score 3487; DB 6; Length 672; 
Best Local Similarity 99.9%; Pred. No. 0; 

Matches 671; Conservative 0; Mismatches 1; Indels 0; Gaps 0; 

Qy 1 MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIAS 60 

| | | I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I 

Db 1 MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIAS 60 

Qy 61 QVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 120 

| | | | M II II I I I I I I I I I I I I I I II I I II II I I I I I I I I I I I I I M I I M I I I I I I I I I 

D b 61 QVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 120 

Qy 121 RGH GGKMK S GQ I WIN GQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAF I AQMRLP RTFS 18 0 

I I I I M I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I M M I I I 

Db 121 RGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFS 18 0 



181 QAQ RD KRVEDVI AE L RL RQ CANT RVGNT YVRGVS GGERRRVS I GVQ LLWN PGILILDEPT 240 

| | | | | I I I I I I M I I II I I I 11 I I I II I I I I I I I I I I I I I M I I I I I I I I I I I I I I I M I 
181 QAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPT 240 

241 S GLD S FT AHNLVTT L S RLAKGN RLVLI S LHQ P RS D I FRL FDLVLLMT S GT P I YLGAAQQM 300 

|| | | | | | | | | I I I I I I I I I I I I I II II I I I I I I I I I M I M I I I I I I I II I I I I I I I I I I 
241 SGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPI YLGAAQQM 300 

301 VQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLA7VLFLEKVQGFDDFL 360 

| I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I 
301 VQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFL 360 

361 WKAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHG 420 

| | | | | | | || | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
361 WKAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHG 420 

421 SEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYYE 480 

| | | | | | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
421 SEACLMSLIIGFLYYGHGALQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYYE 480 

481 LEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLWFC 540 

| | | | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 

481 LEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLWFC 540 

541 CRTMAIAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFS 600 

| | | | | | | | I M I I I I I I I I I I II I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I 
541 CRTMALAAS AMLPTFHMS S FFCNALYN S FYLTAGFMINLDNLWI VPAWI S KLS FLRWCFS 600 

601 GLMQIQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYLS 660 

| | | | | | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I II I I I I I 
601 GLMQIQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYLS 660 

661 LKLIKQKSIQDW 672 

I I I I I I I I I II I 
661 LKLIKQKSIQDW 672 



RESULT 2 


AAE31705 


ID 


AAE31705 standard; protein; 673 AA. 


XX 




AC 


AAE31705; 


XX 




DT 


24-MAR-2003 (first entry) 


XX 




DE 


Human ABCG8 protein. 


XX 


ABC family cholesterol transporter; ABCG8; sterol-related disorder; 


KW 


KW 


sitosterolaemia; hyperlipidaemia; hypercholesterolemia; gall stone; 


KW 


HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 


KW 


human; ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; 


KW 


ABCG5. 


XX 




OS 


Homo sapiens . 


XX 




PN 


WO200281691-A2. 


XX 





Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

QY 
Db 

Qy 

Db 

Qy 

Db 



PD 17-OCT-2002. 
XX 

PF 20-NOV-2001; 2001WO-US043823 . 
XX 

PR 20-NOV-2000; 2000US-0252235P . 

PR 28-NOV-2000; 2000US-0253645P . 
XX 

PA (TULA-) TULARIK INC. 

PA (TEXA ) UNIV TEXAS SYSTEM. 

XX 

PI Hobbs HH, Shan B, Barnes R, Tian H; 
XX 

DR WPI; 2003-058548/05. 

DR N-PSDB; AAD48883. 
XX 

PT New ABCG8 polypeptides and nucleic acids, useful for treating sterol- 

PT related disorders e.g. sitosterolemia, hypercholesterolemia, 

PT hyperlipidemia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies. 

XX 

PS Claim 22; Page 81-82; 94pp; English. 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is human ABCG8 protein 
XX 

SQ Sequence 673 AA; 

Query Match 82.5%; Score 2883.5; DB 6; Length 673; 
Best Local Similarity 81.9%; Pred. No. 6.6e-276; 

Matches 551; Conservative 52; Mismatches 69; Indels 1; Gaps 1 

Qy 1 MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIAS 60 

Ml II I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I : I I 

Db 1 MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLAS 60 



Qy 61 QVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 120 

| | || || | I I I M : I I I I I : I I I I I I : I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 QVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 120 

Qy 121 RGH GGKMK SGQIWINGQPSTPQ L VRKC VAHVRQH DQ L L PN LT VRET LAF I AQMRL P RT F S 180 

II I I I |:| II I I I Ml 111:111 I I Mill Ml l:M II I I I II Ml IN IMM MMI 
D b 121 RGH GG K IKSGQIWINGQPSSPQ L VRKC VAH VRQHNQ L L PN LT VRET LAF I AQMRL P RT F S 180 

Qy 181 QAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPT 240 

I | | | | I I I I II I I I I I I I I I I : I I I I I I II M M I I M II I I I I II I I I M I II I II M 
Db 181 QAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPT 240 

Qy 241 SGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQM 300 

M I II I I I I I I I I I I I II I I I I II I M I II I I I I I M I I I I I I I I I I I I I I I I I I I I I 
Db 241 SGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHM 300 



Qy 301 VQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFL 360 

| | | | | : I I : I I I I I I I I I I I I I I I I I I I I II : I : I : II I I I I I I I I I I I I I I : I I I I 
D b 301 VQYFTAI GYPCPRYSNPADFYVDLTS I DRRSREQEIATREKAQSLAALFLEKVRDLDDFL 360 

Qy 361 WKAEAKELNTSTHTVSLTLTQDTDC-GTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIH 419 

| | | | i : | : | | ||:| : : : || : : I I : I I II I I II I II II I I I I I I I 

D b 361 WKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIH 420 

Q y 420 GSEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYY 479 

I : I I I I I I : llllhlll: I I I I I I I I II I I I I I I I I I I I I M I I : I I I : I I I : I I I I 

Db 421 GAEACLMSMT I GFL YFGHGS I QL S FMDTAALL FMI GALI P FNVI LDVI S KC YS ERAML YY 480 

Q y 480 ELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLWF 539 

Mill I I I I MUM II III MM : Ill 

Db 481 ELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLWF 540 

Q y 540 CCRTMALAASAMLPT FHMS S FFCNALYNS FYLTAGFMINLDNLWI VPAWI S KLS FLRWCF 599 

Ml | | | || : | : I || I I I : I II I I I I II II I I M I II Ml I M I I II M I I I I II 
D b 541 CCRIMALAAAALLPTFHMAS FFSNALYNS FYLAGGFMINLSSLWTVPAWI SKVS FLRWCF 600 

Qy 600 SGLMQIQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYL 659 

|||:|||: I Ml I : : II : M I I M M M I I I II I II I I M M : MM 
Db 601 EGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYYV 660 

Qy 660 SLKLI KQKS IQDW 672 

II : I I I I Ml 

Db 661 SLRFIKQKPSQDW 673 



RESULT 3 
ABP52129 

ID ABP52129 standard; protein; 673 AA. 
XX 

AC ABP52129; 
XX 

DT 10-OCT-2002 (first entry) 
XX 

DE Homo sapiens ABC transporter ABCG8 protein SEQ ID NO: 81. 
XX 

KW ATP-binding cassette transporter; ABC transporter; modulation; D loop; 
KW cancer; bacterial infection; fungal infection; protozoal infection; 
KW antibacterial; fungicide; protozoacide . 
XX 

OS Homo sapiens . 
XX 

PN EP1217066-A1. \ 
XX 

PD 26-JUN-2002. 
XX 

PF 21-DEC-2000; 2000EP-00870316 . 
XX 

PR 21-DEC-2000; 2000EP-00870316 . 
XX 

PA (UYGE-) UNIV GENT. 
XX 

DR WPI; 2002-550404/59. 
XX 



PT Modulating activity of ATP-binding cassette (ABC) transporters by 

PT influencing dimerization of nucleotide binding domains through use of D 

PT loop sequence of an ABC transporter, or its antisense peptide or peptide 

PT mimetic. 

XX 

PS Disclosure; Fig 3; 290pp; English. 
XX 

CC The present invention describes a method (Ml) for modulating the activity 

CC of ATP-binding cassette (ABC) transporters by influencing the 

CC dimerisation of the nucleotide binding domains comprises using: (a) a 

CC polypeptide (polyP) consisting of 5-50 amino acids comprising the D loop 

CC sequence of an ABC transporter (ABP52049 to ABP52091) ; (b) a polyP 

CC consisting of the D loop sequence of an ABC transporter; (c) a peptide 

CC mimetic or antisense peptide of (a) or (b) . ABC transporters have 

CC antibacterial, fungicide and protozoacide activities. (Ml) is useful for 

CC selectively modulating the activity of ABC transporters belonging to the 

CC group of multidrug transporter/P-glycoproteins . Bacterial, fungal or 

CC protozoal ABC transporters are involved in the infection of a mammal or 

CC in the induction of resistance to antibiotics or drugs in a mammal. (Ml) 

CC is useful for preventing, treating or alleviating diseases associated 

CC with functionality of an ABC transporter. ABP52092 to ABP52140 represent 

CC ABC transporter proteins given in the exemplification of the present 

CC invention 

XX 

SQ Sequence 673 AA; 

Query Match 82.4%; Score 2879.5; DB 5; Length 673; 

Best Local Similarity 81.7%; Pred. No. 1.6e-275; 

Matches 550; Conservative 52; Mismatches 70; Indels 1; Gaps 1; 

Qy 1 MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIAS 60 

Ml || I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I : I I 
Db 1 MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLAS 60 

Qy 61 QVPWFEQLAQFKI PWRSH S SQDSCELGI RNLS FKVRSGQMLAI I GS S GCGRAS LLDVITG 120 

| | | | | | | | | | | | : I I I I I : I I I I I I : I I I M I I I I I I I I I I I M I I I I I I I I I I I I I 
D b 61 QVPWFEQLAQFKMPWT S P SCQNSCELGIQNLS FKVRS GQMLAI I GS S GCGRAS LLDVITG 120 

Qy 121 RGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFS 180 

I I I I I I : I I I I II I I I I I I : II I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I M I I 
Db 121 RGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFS 180 

Qy 181 QAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPT 240 

I | | | M || I I I I I I I I I I I I I : I I I I I I I I I = I I M I I I I I I I I M I I I I I I II I I I I I 
Db 181 QAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPT 240 

Qy 241 SGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQM 300 

| | | | | | I I I I I I II I I I I I I I I I I I I I I I I M I I I I I I II II I I I I I I I I I I 

Db 241 SGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHM 300 

Qy 301 VQ Y FTSIGHPCPRYSN PAD F YVD LTSIDRRSKE REVAT VE KAQ S LAAL F L E KVQ G F D D FL 360 

I | M |:| |:| I I I I I I I M I I I I M I I M M:|:|:l I I I I I I I I I I M I I I : MM 
Db 301 VQYFTAIGYPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDFL 360 

Qy 361 WKAEAKELNTSTHTVSLTLTQDTDC-GTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIH 419 

| | | | | : | : | | MM : : M I : M I M I II I II I M I I I II I I M I 

Db 361 WKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIH 420 



Qy 420 GSEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYY 479 

| : I I I I I I : 11111:111: I I I I I I I I I I I I I I I I I I I I M I I I I : I I I : I I I : I I I I 
D b 421 GAEACLMSMT I GFLYFGHGS I QLS FMDTAALLFMI GALI P FNVTLDVT SKC YS ERAMLYY 480 

Qy 480 ELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLWF 539 

I I I I I I I I II I I I I I I I I I I I I I I I I : I I I II III I I I I : I I I I I I I I I I I I I 
Db 481 ELEDGLYTTGPYFFAKILGELPEHCAYIII YGMPTYWLANLRPGLQPFLLHFLLVWLWF 540 

Qy 54 0 CCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCF 599 

Ml | | I I I : I : I I I I I I : I I I I I I II I I I I I I I I I I : I I I I I I I I I : I M I I I I 
Db 541^ CCRIMAIAAAALLPTFHl^SFFSNALYNSFYl^GGFMINLSSLWTVPAWISKVS FLRWCF 600 

Qy 600 SGLMQIQFNGHLYTTQI GNFTFS I LGDTMI SAMDLNSHPLYAI YLIVI GI S YGFLFLYYL 659 

|||:|||: | :M I :: II ::| I : I : I : I I I I I I I I I I I : I II: Ml: 
Db 601 EGLMKIQFSRRTYKMPLGNLTIAVSGDKILSVMELDSYPLYAIYLIVIGLSGGFMVLYYV 660 

Qy 660 SLKLIKQKSIQDW 672 

I I : I I I I III 

Db 661 SLRFIKQKPSQDW 673 



RESULT 4 
ABG61539 

ID ABG61539 standard; protein; 374 AA. 
XX 

AC ABG61539; 
XX 

DT 27-AUG-2002 (first entry) 
XX 

DE Human transporter and ion channel, TRICH9, Incyte ID 6585710CD1. 
XX 

KW Human; transporter and ion channel; TRICH; transport disorder; 

KW neurological disorder; muscle disorder; immunological disorder; cancer; 

KW scleroderma; systemic lupus erythematosus; allergy; leukaemia; 

KW cell proliferative disorder; cervical cancer; breast cancer; 

KW neurodegenerative disorder; Parkinson's disease; Alzheimer's disease; 

KW myotonic dystrophy; catatonia; endocrine disorder; diabetes; 

KW Grave's disease; gastrointestinal disorder; Crohn's disease; 

KW renal disorder; Good pasture's syndrome; viral infection; cirrhosis; 

KW bacterial infection; fungal infection; parasitic infection; 

KW protozoal infection; helminthic infection; cardiovascular disorder; 

KW atherosclerosis; hepatic disease. 

XX 

OS Homo sapiens. 
XX 

PN WO200240541-A2. 
XX 

PD 23-MAY-2002. 
XX 

PF 25-OCT-2001; 2001WO-US046055 . 
XX 

PR 27-OCT-2000; 2000US-0243989P. 

PR 03-NOV-2000; 2000US-0245904P . 

PR 09-NOV-2000; 2000US-0247673P . 

PR 17-NOV-2000; 2000US-024 9661P . 

PR 20-NOV-2000; 2000US-0252232P . 



PR 01-DEC-2000; 2000US-0250790P . 
XX 

PA (INCY-) INCYTE GENOMICS INC. 
XX 

PI Tang YT, Yue H, Nguyen DB, Hafalia AJA, Elliott VS, Lu Y; 

PI Walia NK, Yao MG, Baughn MR, Gandhi AR, Ding L, Sanjanwala M; 

PI Ramkumar J, Arvizu C, Gietzen KJ, Lai PG, Azimzai Y, Khan FA; 

PI Thangavelu K f Thornton M, Lu DAM, Tribouley CM, Warren BA, Ison CH; 

PI Das D, Raumann BE, Policky JL, Kearney L; 

XX 

DR WPI; 2002-463570/49. 

DR N-PSDB; ABK83218. 
XX 

PT New transporters and ion channels (TRICH) polypeptides, useful for 

PT diagnosing, preventing, and treating disorders associated with an 

PT abnormal expression or activity of TRICH, e.g. immunological, muscular or 

PT renal disorders. 

XX 

PS Claim 1; Page 143-144; 178pp; English. 
XX 

CC The invention relates to human transporters and ion channels (TRICH) 

CC polypeptides, a naturally occurring amino acid sequence 90 % identical to 

CC TRICH, a biologically active fragment of TRICH or an immunogenic fragment 

CC of TRICH. Also included are an isolated polynucleotide encoding TRICH, a 

CC recombinant polynucleotide comprising a promoter sequence operably linked 

CC to the TRICH polynucleotide, a cell transformed with the recombinant 

CC polynucleotide, a transgenic organism comprising the recombinant 

CC polynucleotide, an isolated antibody that binds specifically to TRICH, 

CC and screening for compounds which bind to TRICH, modulate TRICH, modulate 

CC TRICH expression or are ant/agonists of TRICH. The polypeptides are 

CC useful for diagnosing, treating, and preventing transport, neurological, 

CC muscle, immunological disorders (e.g. scleroderma, systemic lupus 

CC erythematosus, allergies), cell proliferative disorders such as cancers 

CC (e.g. leukaemia, cervical or breast cancers), neurodegenerative disorders 

CC (e.g. Parkinson's disease, Alzheimer's disease), muscular disorders (e.g. 

CC myotonic dystrophy, catatonia), endocrine disorders (e.g. diabetes, 

CC Grave's disease), gastrointestinal disorders (e.g. Crohn's disease), 

CC renal disorders (e.g. Goodpasture's syndrome), viral, bacterial, fungal, 

CC parasitic, protozoal and helminthic infections, cardiovascular disorders 

CC (e.g. atherosclerosis), or hepatic diseases (e.g. cirrhosis) and many 

CC other diseases and disorders detailed in the specification. They can also 

CC be used in assessing the effects of exogenous compounds on the expression 

CC of nucleic acid and amino acid sequences of transporters and ion 

CC channels. TRICH or its fragments may also be used in screening for 

CC compounds that specifically bind to and modulate the activity of TRICH. 

CC The polynucleotides can be used to create knock-in humanised animals or 

CC transgenic animals to model human disease. The present sequence 

CC represents a TRICH protein 

XX 

SQ Sequence 374 AA; 

Query Match 43.2%; Score 1508.5; DB 5; Length 374; 

Best Local Similarity 74.9%; Pred. No. 4.2e-140; 

Matches 280; Conservative 43; Mismatches 50; Indels 1; Gaps 1 



QY 



300 MVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDF 359 
|| | M : I I : I I I II I M I I I I I I I I I I I I I I : I : I : I I I I I I I I I I I II I I I : III 



Db 1 MVHYFTAI GYPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDF 60 

Qy 360 LWKAEAKELNTSTHTVSLTLTQDTDC-GTAVELPGMIEQFSTLIRRQISNDFRDLPTLLI 418 

HIM |:|: | I ||:| : :: I I :: I I : I I I I I I I I I I I I I I I I I I I 

Db 61 LWKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLI 120 

Qy 419 HGS EACLMS LI I GFL YYGHGAKQLS FMDTAALLFMI GALI P FNVI LDWS KCHS ERSMLY 478 

||:||||||: 11111:111: I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I : I I I : I I I 
Db 121 HGAEACLMSMTIGFLYFGHGSIQLSFMDTAALLFMIG7^IPFNVILDVISKCYSER7\MLY 180 

Qy 479 YELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLW 538 

I I M I I I II I I I I I I I I I I I I I I I I I I : I I I M Ml I I I I : I I I I I I I I I I I I 
D b 181 YELEDGLYTTGP YFFAKI LGELPEHCAYI I I YGMPT YWLANLRPGLQPFLLHFLLVWLW 240 

Q y 539 FCCRTMALAASAMLPT FHMS S FFCNALYNS FYLTAGFMINLDNLWI VPAWI S KLS FLRWC 598 

MM I I II I : i : I I I I I I : II I I I L I I I I I I MINI Ml I I I I I t I i I I I I I I 
Db 241 FCCRIMALAAAALLPTFHMAS FFSNALYNS FYLAGGFMINLS SLWTVPAWI SKVS FLRWC 300 

Qy 599 FSGLMQIQFNGHLYTTQI GNFTFS ILGDTMI SAMDLNSHPLYAI YLI VI GI S YGFLFLYY 658 

| |||:|||: I : I I I :: II : : I I I : I : I : I I I I I M I II I : I I I : III 
Db 301 FEGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYY 360 

Qy 659 LSLKLIKQKSIQDW 672 

: I I : I I I I III 
Db 361 VSLRFIKQKPSQDW 374 



RESULT 5 




AAG18078 




ID 


AAG18078 standard; protein; 648 AA. 


XX 






AC 


AAG18078; 




XX 






DT 


17-OCT-2000 


(first entry) 


XX 






DE 


Arabidopsis 


thaliana protein fragment SEQ ID NO: 19343. 


XX 






KW 


Protein identification; signal transduction pathway; metabolic pathway; 


KW 


hybridisation assay; genetic mapping; gene expression control; promoter; 


KW 


termination 


sequence . 


XX 






OS 


Arabidopsis 


thaliana. 


XX 






PN 


EP1033405-A2. 


XX 






PD 


06-SEP-2000. 




XX 






PF 


25-FEB-2000; 2000EP-00301439 . 


XX 
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99US-0128714P. 

99US-0129845P. 

99US-0130077P. 

99US-0130449P. 

99US-0130510P. 

99US-0130891P. 

99US-0131449P. 

99US-0132048P. 

99US-0132407P. 

99US-0132484P, 

99US-0132485P. 

99US-0132486P. 

99US-0132487P. 
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99US-0134219P. 
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99US-0135353P. 

99US-0135629P. 
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Query Match 22.4%; Score 782; DB 3; Length 648; 

Best Local Similarity 31.1%; Pred. No. 9.3e-68; 

Matches 216; Conservative 130; Mismatches 266; Indels 82; Gaps 22; 
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: I : I I II :l hi : : :: I |: ll::| II I: 
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ID AAG18079 standard; protein; 632 AA. 
XX 

AC AAG18079; 
XX 

DT 17-OCT-2000 (first entry) 
XX 

DE Arabidopsis thaliana protein fragment SEQ ID NO: 19344. 
XX 

KW Protein identification; signal transduction pathway; metabolic pathway; 

KW hybridisation assay; genetic mapping; gene expression control; promoter; 

KW termination sequence. 
XX 

OS Arabidopsis thaliana. 
XX 

PN EP1033405-A2. 
XX 

PD 06-SEP-2000. 
XX 

PF 25-FEB-2000; 2000EP-00301439 . 
XX 

PR 25-FEB-1999; 99US-0121825P . 

PR 05-MAR-1999; 99US-0123180P . 
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PR 30-APR-1999; 99US-0132048P . 
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PR 04-MAY-1999; 99US-0132484P . 

PR 05-MAY-1999; 99US-0132485P . 
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Query Match 22.3%; Score 779.5; DB 3; 

Best Local Similarity 31.5%; Pred. No. 1.6e-67; 
Matches 212; Conservative 124; Mismatches 262; 



Length 632; 
Indels 75; 



Gaps 20; 
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99US-0160741P. 

99US-0160767P. 

99US-0160768P. 

99US-0160770P. 

99US-0160814P. 

99US-0160815P. 

99US-0160980P. 

99US-0160981P. 

99US-0160989P. 

99US-0161404P. 

99US-0161405P. 

99US-0161406P. 

99US-0161359P. 

99US-0161360P. 

99US-0161361P. 

99US-0161920P, 

99US-0161992P. 

99US-0161993P. 

99US-0162142P. 



Query- Match 22.3%; Score 778; DB 3; Length 625; 

Best Local Similarity 31.3%; Pred. No. 2.2e-67; 

Matches 208; Conservative 121; Mismatches 255; Indels 80; Gaps 



Qy 


22 


GLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIASQVPWFEQLAQFKIPWRSHSSQ 
||||:: | | : : : | : | | 1 1 : 1 1 : 1 
GLQMSMYP I TLKFEEWYKVKI EQTSQCMGSWKSKE- - 


81 


Db 


20 


55 


Qy 


82 


DSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTP 

: : : : | | : I 1 : : 1 1 1 1 : : 1 1 : 1 1 1 M : : 1 1 1 1 : 
KT I LNGI TGMVCPGE FLAMLGP S GS GKTTLLS ALGGR — LS KT FS GKVMYNGQP FS G 


141 


Db 


56 


110 


Qy 


142 


QLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCA 

: | : | | | I I : I 1 1 1 1 1 1 1 : 1 1 1 : : : : — 1 : 1 M M 1 : 1 
CIKRR-TGFVAQDDVLYPHLTVWETLFFTALLRLPSSLTRDEK7VEHVDRVIAELGLNRCT 


201 


Db 


111 


169 


Qy 


202 


NTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKG 

|: :| | | |:: I I I 1 1 ::| II : 1 : 1 1 1 1 1 1 1 1 1 1 III :)ll: Ml 1 
NSMIGGPLFRGISGGEKKRVSIGQEMLINPSLLLLDEPTSGLDSTTAHRIVTTIKRLASG 


261 


Db 


170 


229 


Qy 


262 


NRLVLI S LHQPRS DI FRLFDLVLLMT S GT P I YLGAAQQMVQYFT S I GHPCPRYSNPADFY 

| | : : : | I I 1 1 : : 1 1 1 : 1 : : 1 : 1 1 1 1 1 1 1 : 1 1 : 1 : 1 1 1 1 1 

GRTVVTTIHQPSSRIYHMFDKVVLLSEGSPIYYGAASSAVEYFSSLGFSTSLTVNPADLL 


321 


Db 


230 


289 


Qy 


322 


VDLTS IDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVS 

:| | : : : |:| 1 h ::| : : : : 1 : 1 : 1 :|: 
LDLANGIPPDTQKETSEQEQKTVK— ETLVSAYEKNIS TKLKAELCNAESHSYE 


376 


Db 


290 


341 


Qy 


377 


LT LTQDTDCGTAVELPGMIEQFSTLIRRQI-SNDFRDLPTLLIHGSEACLMSLI 


429 


Db 


342 


| | : | | M : 1 :: 1 : 1 1 1 : 
YTKAAAKNLKSEQWCTT WWYQFTVLLQRGVRERRFESFNKLRIF QVISVAF 


392 


Qy 


430 


I GFL Y YGHGAKQLS FMDTAALL FMI GAL I P FNVI LDWS KCH S ERSMLYYELEDGL YTAG 
: | I : I 1 1 1 1 1 1 1 : : 1 1 : 1 1 1 1 = 1 


489 


Db 


393 


LGGLLWWHTPKS-HIQDRTALLFFFSVFWGFYPLYNAVFTFPQEKRMLIKERSSGMYRLS 


451 


Qy 


490 


PYFFAKILGELPEHCAWIIYAMPIYWLTNLRPVPELFLLHFLLWLWFCCRTMALAAS 

| | | : : | : | 1 1 : III: hi 1 1 : 1 1 : 1 1 : : N 
SYFMARNVGDLPLELALPTAFVFIIYWMGGLKPDPTTFILSLLVV^ 


549 


Db 


452 


511 


Ov 


550 


AMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQIQFNG 
. i . . . i . i . . .11 1 - II* • 1 • 1 ; 1 I : 


609 


Db 


I : 1 : : ; i . i . - - i i i • ■ ■ • • ■ • ■ 

512 ALLMNIKQATTLASVTTLVFLIAGGYYVQQIPPFIV — WLKYLSYSYYCYKLLLGIQYTD 


569 


Qy 


610 


HLY TTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGIS-YGFLFLYYLSL 
| : : | : I 1 : 1 1 1 : 1 : 1 : 1 : 1 : : 1 : : 1 


661 


Db 


570 


DDYYECSKGVWCRVGDF PAIKSMGLNN LWIDVFVMGVMLVGYRLMAYMAL 


619 


Qy 


662 


KLIK 665 




Db 


620 


: 1 

HRVK 623 





RESULT 8 
AAU96986 

ID AAU96986 standard; protein; 652 AA. 



XX 

AC AAU96986; 
XX 

DT 07-AUG-2003 (revised) 

DT 30-JUL-2002 (first entry) 

XX 

DE Rat ABCG5 protein. 
XX 

KW Rat; ABCG5 ; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypers terolemia ; Alzheimer ! s disease. 
XX 

OS Rattus sp. 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M- 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 

DR N-PSDB; ABK51686. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 45; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the rat ABCG5 protein of the invention. (Updated 

CC on 07-AUG-2003 to correct OS field.) 



XX 

SQ Sequence 652 AA; 



Query Match 20.3%; Score 710; DB 5; Length 652; 

Best Local Similarity 30.2%; Pred. No. 1.3e-60; 

Matches 190; Conservative 129; Mismatches 258; Indels 52; Gaps 15; 

Qy 18 QDASGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIASQV- PWFEQLAQFKIPWR 76 

: | | :: I I II : : I I :::: I :::| II I 
Db 10 EGARGPHNNRGSQSSLEEGSVTGSEARHSLGVLNVSFSV— SNRVGPW WN 57 

Qy 77 SHSSQDSCELGI-RNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWIN 135 

|| : | : : : I : I I I : I : I I I I I : : I I I I : I I 
Db 58 IKSCQQKWDRKILKDVSLYIESGQTMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVN 117 

Qy 136 GQ P S T PQ LVRKCVAHVRQ H DQL L PN LT VRET LAF I AQMRLP RT F S QAQ RD KRVEDVI AE L 195 

| : ||::: | | I : M II I I I : I : I I : I Ihll |: II 

D b 118 GCELRRDQFQDCVSYLLQSDVFLSSLTWETLRYTAML7VL-RSSSADFYDKKVEAVLTEL 176 

Qy 196 RLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTL 255 

| | : : | | I : I I I II I I I I I I I : I : : : I I I I I : I I I I I : : : I I 

D b 177 SLSHVMQMIGNYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANHIVLLL 236 

Qy 256 SRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYS 315 

||: ||:|::::MM|::| II : ::| I :: I : : I : : I : I : I I I : I 
Db 237 VELARRNRIVIVTIHQPRSELFHHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHS 296 

Q y 316 NPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHW 375 

II |||:IMI:I HUM: I :: I I : I : I I * : : I 

Db 297 NPFDFYMDLTSVDTQSREREIETYKRVQMLESAFRQ SDICHKI-LENIERTRHLK 350 

Qy 376 SLTL TQDTDCGTAVELPGMI EQFSTLI RRQI SNDFRDLPTLLIHGSEACLMSLI I G 431 

: | : | : : III: I : I I I I : : : : : : I I : 

Db 351 TLPMVPFKTKNP PGMFCKLGVLLRRVTRNLMRNKQWIMRLVQNLIMGLFLI 4 02 

Qy 432 F--LYYGHGAXQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAG 4 89 

|| : : : | ||: : |: :|: |: |:: I -MM 

Db 403 FYLL RVQNNML KGAVQ D RVGLL YQ LVGAT P YT GMLN AVNL F PML RAVS DQESQDGLYQKW 4 62 

Qy 490 PYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELF LL — HFLLVWLWFCC 541 

I :| II :|:: II II I || I : :| 
Db 463 QMLLAYVLHALP FS I VATVI FS S VCYWTLGLYPEVARFGYFS AALLAPHLI GEFL 517 

Qy 542 RTMALAASAMLPT FHMS S FFCNALYNS FYLTAGFMINLDNLWI VPAWI SKLSFLRWCFSG 601 

|: | | ::| : : :| |: |:: : I : :| ::l 

Db 518 -TLVLLGMVQNPNI-VNSIVALLSISGLLIGSGFIRNIEEMPIPLKILGYFTFQKYCCEI 575 

Qy 602 LMQIQFNGHLYTTQI GN FT FS I LGDTMI S 630 

I : : I I : I I | : : I I 

Db 57 6 LWNEFYGLNFT — CGGSNTSVPNNPMCS 602 



RESULT 9 
AAU96990 

ID AAU96990 standard; protein; 651 AA. 
XX 

AC AAU96990; 



XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Human ABCG5 mutant R389H protein sequence. 
XX 

KW Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypers terolemia; Alzheimer's disease; 

KW mutant; mutein. 
XX 

OS Homo sapiens. 

OS Synthetic. 
XX 

FH Key Location/Qualifiers 

FT Misc-dif ference 389 

FT /note= "Wild-type Arg substituted by His" 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/ ) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Claim 7; Page; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 



CC acid sequence represents the human ABCG5 mutant R389H protein of the 
CC invention. Note: This sequence is not shown in the specification but is 
CC derived from the wild-type human ABCG5 protein (AAU96984) given on pages 



CC 
XX 
SQ 


35-36 of 


the specification 




Sequence 


651 AA; 




Query Match 20.2%; Score 705; DB 5; Length 651; 
Best Local Similarity 29.2%; Pred. No. 4e-60; 

Matches 196; Conservative 129; Mismatches 262; Indels 84; Gaps 


18; 


Qy 


17 


LQDASGLQDSL FSSESDNSLYFTYSGQSNTLEVRDLTYQVDIASQVPWFEQLAQFK 

|| I 1 1 1 :: :| 1 :: 1 : 1 1 M : : : : 
LQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVR PWWD-ITSCR 


72 


Db 


15 


61 


Qy 


73 


IPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKMKSGQ 

| : : : : 1 1 1 1 1 : : 1 : 1 1 1 1 1 : : 1 1 1 : : II 1 1 1 : 
QQWTRQI LKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTF-LGE 


131 


Db 


62 


112 


Qy 


132 


IWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDV 

ii . . i . . i i i i i . i i i t i i i * i ■ • i • l*lll 
: : : I I : : : 1 : : 1 1 1 1 1 : 1 1 1 1 1 1 1 • 1 - - 1 * 

VYVNGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI-RRGNPGSFQKKVEAV 


191 


Db 


113 


171 


Qy 


192 


IAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNL 

:||| I |: :|| : |:| II Mill 1 1 1 : 1 : : : 1 1 1 1 : 1 1 1 II: : 
MAELSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQI 


251 


Db 


172 


231 


Qy 


252 


WTLSRLAKGNRLVLISLHQPRSDIFRLFDLVT.LMTSGTPIYLGJ^AQQMVQYFTSIGHPC 

| | ||: ||:|::::|||||::|:|ll : ::: 1 1: 1 :h :l l-H 
VVLLVELARRNRI VVLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPC 


311 


Db 


232 


291 


Qy 


312 


PRYSNPAI)FWDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTS 

1 : 1 1 1 1 1 1 : 1 1 1 1 : 1 : 1 1 1 1 1 : 1 :: 1 : : : : : 1 : 
PEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSA ICHKTLKNIERM 


371 


Db 


292 


345 


Qy 


372 


THTVSLTL TQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMS 

|:|: 1 : 1 : 1 1 : : 1 = 11 1 1 : :: 1 : : 1 


427 


Db 


346 


KHLKTLPMVPFKTKDS PGVFSKLGVLLRRVTRNLVRNKLAVITHLLQNLIMG 


397 


Qy 


428 


LIIGFLYYGHGAKQL— SFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYYELEDGL 
| : I : | : | ||: |: :|: |: |:: 1 :||l 
L FL L F FVLRVRS NVLKGAI QD RVGL L YQ FVGAT P YTGMLNAVN L F P VL RAVS DQ E S QD GL 


485 


Db 


398 


457 


Qy 


486 


YTAGP YFFAKI LGELP EHCAYVI I YAMP I YWLTNLRPVPELF LL — HFLLVWLV 

| 1 1 II :h: II 1 1 1 || I : :| 
YQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFL- 


537 


Db 


458 


516 


Qy 


538 


VFCCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRW 

|:| 1 ::| : :||: |: : I II :| :: 
TLVLLGIVQNPNI-VNSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFTFQKY 


597 


Db 


517 


570 


Qy 


598 


CFSGLMQIQFNGHLYTTQIGNFTFSILGDTM ISAMDLNSHPLY 

I | : : | | : 1 1 : 1 : : 1 1 : 1 1 1 
CSEILWNEFYGLNFT — CGSSNVSVTTNPMCAFTQGIQFIEKTCPGATSRFTMNFLILY 


640 


Db 


571 


628 



Qy 641 AIY — LIVIGI 649 



Db 



629 SFIPALVILGI 639 



RESULT 10 
AAU96985 

ID AAU96985 standard; protein; 652 AA. 
XX 

AC AAU96985; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Mouse ABCG5 protein. 
XX 

KW Mouse; ABCG5 ; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypers terolemia; Alzheimer's disease. 
XX 

OS Mus sp. 
XX 

FH Key Location/Qualifiers 

FT Misc-dif f erence 638. .652 

FT /note= "Encoded by CTAG" 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US02 9859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 

DR N-PSDB; ABK51684. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 42; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 



CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the mouse ABCG5 protein of the invention 

XX 

SQ Sequence 652 AA; 

Query Match 20.1%; Score 702.5; DB 5; Length 652; 

Best Local Similarity 29.4%; Pred. No. 7.1e-60; 

Matches 195; Conservative 127; Mismatches 252; Indels 89; Gaps 18; 

Qy 24 QDSLFS SESDNS LYFT YSGQSNTLEVRDLTYQVDIASQV- PWFEQLAQFKI PWRSHS 79 

I I : : I : : I I : : I I : : : : I I I I I 

Db 27 QGSVTGTEARHSLGVLHVSYS VSNRVGPW WNIKS 60 

Qy 80 SQDSCELGI-RNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQP 138 

| : I :::| : III:: hllll |: :|ll 1 = 11 |::::ll 
D b 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

Qy 139 STPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLR 198 

: I :: I I I I : I I I I I I I : I : I I : I : I : I I I : I I I 

D b 121 LRRDQFQDCFS YVLQS DVFLS S LTVRETLRYTAMLALCRS - SADFYNKKVEAVMTELS LS 179 

Qy 199 Q CANT RVGNT YVRGVS GG ERRRVS I GVQL LWN P G I L I LD EPTSGLDS FTAHNLVT T L S RL 258 

| : : | : I : I I I I I II I I I I I : I : : : I I I I I : I I I I I : : I I : I 

Db 180 HVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAEL 239 

Qy 259 AKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPA 318 

| : : | : | : : : : I I I I I : : I : I I : : : I I : : I : : I : : I : I : I I I : I I I 
Db 240 ARRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPF 299 

Q y 319 DFYWLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHW 37 8 

I I I : I I I I : I : I : M I : I :: I I II : I I = II 

Db 300 DFYMDLTSVDTQSREREIETYKRVQMLECAFKES-DIYHKILENIERARYLKTLPTVPFK 358 

Qy 379 LTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGF — LYYG 436 

| : | III: 1:11 II: : : : : : I I : I I 

Db 359 -TKDP PGMFGKLGVLLRRVT RN LMRNKQAVTMRLVQN L IMGL FL I FYLLRVQ 409 

Qy 437 HGAKQLS FMDTAALLFMI GALI PFNVILDWS KCHS ERSMLYYELEDGLYTAGPYFFAKI 496 

: : : | I I : : I : : I : I : I : : I : I I I I I : 

Db 410 NN T L KGAVQ D RVGL L YQLVGAT P YT GMLN AVN L FPMLRAVS DQ E S QDGL YHKWQML LAYV 469 

Qy 4 97 LGELPEHCAYVI I YAMPI YWLTNLRPVPELF L L - - H FL L VWL WFC C RTMALAA 54 8 

III : I :: II II I I I I : : I I : I 

Db 470 LHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFL TLVLLG 523 

Qy 549 SAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQIQFN 608 

| : : | : : : I I : I : : I : : I : : I I : : I 

Db 524 IVQNPNI-VNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFY 582 

Qy 609 GHLYTTQI GNFTFSI LGDTMI SAMDLNSHPLYAI YLIVIGI S Y GFL 654 

| Ml :|: :||: | | |: : II 



Db 



583 GL NFTCGGSNTSML NH PMC A 1 TQ GVQ F I EKT C P GAT S RFTAN FL 626 



Qy 655 FLY 657 

I I 

Db 627 ILY 629 



RESULT 11 
AAE13308 

ID AAE13308 standard; protein; 652 AA. 
XX 

AC AAE13308; 
XX 

DT 12-FEB-2002 (first entry) 
XX 

DE Mouse sitosterolaemia susceptibility gene (SSG) protein variant #1. 
XX 

KW Mouse; sitosterolaemia susceptibility gene; SSG; atherosclerosis; mutein; 

KW sterol-related disorder; hyperlipidaemia; hypercholesterolaemia; mutant; 

KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 

KW xanthoma; haemolytic anaemia; transgenic animal; therapy; variant. 
XX 

OS Mus sp . 

OS Synthetic. 
XX 

FH Key Location/Qualifiers 

FT Misc-dif f erence 17 

FT /note= "Wild type lie substituted with Leu" 
XX 

PN WO200179272-A2. 
XX 

PD 25-OCT-2001. 
XX 

PF 18-APR-2001; 2001WO-US012758 . 
XX 

PR 18-APR-2000; 2000US-0198465P . 

PR 15-MAY-2000; 2000US-0204234P . 
XX 

PA (TULA-) TULARIK INC. 
XX 

PI Tian H, Schultz J, Shan B; 
XX 

DR WPI; 2002-017598/02. 
XX 

PT Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 

PT useful for screening a compound that increases the level of expression or 

PT activity of SSG polypeptide for treating sterol-related disorder. 
XX 

PS Disclosure; Page; 105pp; English. 
XX 

CC The invention relates to an isolated Sitosterolaemia Susceptibility Gene 

CC (SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 

CC binding cassette (ABC) family cholesterol transporter. SSG is useful for 

CC identifying a compound useful in the treatment or prevention of a sterol- 

CC related disorder, including sitosterolaemia, hyperlipidaemia, 

CC hypercholesterolaemia, gall stones, HDL deficiency, atherosclerosis or 

CC nutritional deficiencies. SSG is also useful for treating cholesterol- 



CC associated diseases or conditions including coronary heart disease and 

CC other cardiovascular diseases r and sitosterolaemia-associated condition 

CC including arthritis , xanthomas and chronic haemolytic anaemia. SSG 

CC expression cassette is useful in the production of transgenic non-human 

CC animals. SSG genes and their homologues are useful as tools for a number 

CC of applications including diagnosing sitosterolaemia and other 

CC cardiovascular disorders, for forensics and paternity determinations, and 

CC for treating any of a large number of SSG associated diseases. The 

CC present sequence is mouse SSG protein variant obtained by replacing Ilel7 

CC with Leu. Note: The present sequence is not shown in the specification 

CC but is derived from mouse SSG protein referred as SEQ ID NO: 1 (AAE13289) 

CC and shown in figure 7 of the specification 

XX 

SQ Sequence 652 AA; 

Query Match 20.1%; Score 701.5; DB 5; Length 652; 

Best Local Similarity 29.1%; Pred. No. 8.9e-60; 

Matches 194; Conservative 131; Mismatches 245; Indels 97; Gaps 19; 

Qy 24 QDSLFSSESDNS— LYFTYSGQSNTLEVRDLTYQVDIASQV-PWFEQLAQFKIPWRSHS 79 

II: :|: :| |: :|l --I M II 
Db 27 QGSVTGTEARHSLGVLHVSYS VSNRVGPW WNIKS 60 

Qy 8 0 SQDSCELGI-RNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQP 138 

| : | : : : | : I I I : : I : I I I I I : : I I I I : I I I : : : : I I 

Db 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

Qy 139 STPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLR 198 

: I : : I I I I : I I I I I I I : I : I I : I :|:|| |: II I 

Db 121 LRRDQFQDCFSWLQSDVFLSSLTVRETLRYTT^LALCRS-SADFYNKKV^VMTELSLS 17 9 

Q y 199 QCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTMNLVTTLSRL 258 

I : : | : I : I I I I I I I M I I I : I : : : I I I I I : I II I I : : I I : I 

Db 18 0 HVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAEL 239 

Qy 259 AKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPA 318 

| : : I : I : : : : I I I I I : : I : I I : : : I I : : I : : I : : I : I : I I I : I I I 
Db 240 ARRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPF 299 

Qy 319 DFYVT5LTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWK7VEAKELNTSTHTVSLT 378 

|||:||||:l thill: I :: I I II I I 

Db 300 DFYMDLTSVDTQSREREIETYKRVQMLECAFKE SDIYHKI-LENIERARYLKTLP 353 

Qy 379 l TQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGF— 432 

: | : | III: Ml I I : :::■:: I I : I 

Db 354 MVPFKTKDP P GMFG K L GVL L RRVT RN LMRN KQ AVI MRL VQN L I MGL FL I F YL 405 

Qy 433 LYYGHGAKQLS FMDTAALLFMI GALI PFNVI LDWSKCHSERSMLYYELEDGLYTAGPYF 492 

| : : : | I I : : I : : I : I : I : : I : I I I I 

Db 406 L RVQNN T L KGAVQD RVGLL YQ LVGAT P YT GMLNAVN L F PML RAVS DQ E S QD GL YH KWQML 465 

Qy 493 FAKI LGELPEHCAYVI I YAMP I YWLTNLRPVPELF LL— HFLLVWLWFCCRTM 544 

I :| || :|:: II II I | | | : :| |: 
Db 466 LAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVT^FGYFSAALLAPHLIGEFL TL 519 

Qy 545 ALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQ 604 

| | : : I : : :||: |: : | : :| ::| |: 



Db 



520 VLLGIVQNPNI-VNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILW 578 



Qy 605 IQFNGHLYTTQI GNFTFS I LGDTMI SAMDLNSHPLYAI YLI VI GI S Y 651 

: I I III : I : : I I : I I I : : 

Db 57 9 NEFYGL NFTCGGSNTSML NHPMCA— ITQGVQFIEKTCPGATSRFT 622 

Qy 652 -GFLFLY 657 

I I I I 

Db 623 ANFLILY 629 



RESULT 12 
AAE13289 

ID AAE13289 standard; protein; 652 AA. 
XX 

AC AAE13289; 
XX 

DT 12-FEB-2002 (first entry) 
XX 

DE Mouse sitosterolaemia susceptibility gene (SSG) protein, 
XX 

KW Mouse; sitosterolaemia susceptibility gene; SSG; atherosclerosis; 

KW sterol-related disorder; hyperlipidaemia; hypercholesterolaemia; 

KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 

KW xanthoma; haemolytic anaemia; transgenic animal; chromosome 17; therapy. 

XX 

OS Mus sp. 
XX 

PN WO200179272-A2. 
XX 

PD 25-OCT-2001. 
XX 

PF 18-APR-2001; 2001WO-US012758 . 
XX 

PR 18-APR-2000; 2000US-01984 65P . 

PR 15-MAY-2000; 2000US-02 04234P . 
XX 

PA (TULA-) TULARIK INC. 
XX 

PI Tian H, Schultz J, Shan B; 
XX 

DR WPI; 2002-017598/02. 

DR N-PSDB; AAD22008. 
XX 

PT Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 

PT useful for screening a compound that increases the level of expression or 

PT activity of SSG polypeptide for treating sterol-related disorder. 
XX 

PS Claim 19; Fig 7; 105pp; English. 
XX 

CC The invention relates to an isolated Sitosterolaemia Susceptibility Gene 

CC (SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 

CC binding cassette (ABC) family cholesterol transporter. SSG is useful for 

CC identifying a compound useful in the treatment or prevention of a sterol- 

CC related disorder, including sitosterolaemia, hyperlipidaemia, 

CC hypercholesterolaemia, gall stones, HDL deficiency, atherosclerosis or 

CC nutritional deficiencies. SSG is also useful for treating cholesterol- 



CC associated diseases or conditions including coronary heart disease and 

CC other cardiovascular diseases, and sitosterolaemia-associated condition 

CC including arthritis, xanthomas and chronic haemolytic anaemia. SSG 

CC expression cassette is useful in the production of transgenic non-human 

CC animals. SSG genes and their homologues are useful as tools for a number 

CC of applications including diagnosing sitosterolaemia and other 

CC cardiovascular disorders, for forensics and paternity determinations, and 

CC for treating any of a large number of SSG associated diseases. The 

CC present sequence is mouse SSG protein. Mouse SSG is located on chromosome 

CC 17 

XX 

SQ Sequence 652 AA; 

Query Match 20.1%; Score 701.5; DB 5; Length 652; 

Best Local Similarity 29.1%; Pred. No. 8.9e-60; 

Matches 194; Conservative 131; Mismatches 245; Indels 97; Gaps 19; 

Qy 24 QDSLFSSESDNS LYFTYSGQSNTLEVRDLTYQVDIASQV-PWFEQLAQFKIPWRSHS 7 9 

II: :|: :| |: :|| I I II 
Db 27 QGSVTGTEARHSLGVLHVSYS- VSNRVGPW WNIKS 60 

Qy 80 SQDSCELGI-RNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQP 138 

| : | : : : I : I I I : : I : I I I I I : : I II I : I I I : : : : I I 

Db 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

Qy 139 STPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLR 198 

: I : : I I I I : M I I I I I : I : I I : I : I : I I I : I I I 

Db 121 LRRDQFQDCFSWLQSDVFLSSLTVRETLRYTAMLALCRS-SADFYNKKVEAVMTELSLS 179 

Qy 199 QCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRL 258 

I: :|: hi II I I I : I : : : II II I : I I I lh :l hi 

Db 180 HVADQMIGSYNFGGISSGERRRVSIAAQLLQDPICVMMLDEPTTGLDCMTANQIVLLLAEL 239 

Qy 259 AKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPA 318 

|: :|:|::::|||||::|: II : ::| I :: I : : I : : I : I : I I I : I I I 
Db 240 ARRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPF 299 

Qy 319 DFYVXILTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLT 378 

I I I : I I I I : I : I : I I I : I I I I I I I : : : : : I 

Db 300 D F YMD LT S VDT Q S RE RE I ET Y KRVQML EC AFKE SDIYHKI-LENIERARYLKTLP 353 

q y 379 L TQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGF — 432 

: |:| III : |:|| I I: ::: : :| I : I 

Db 354 MVPFKTKDP PGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYL 405 

Qy 433 LYYGHGAKQLS FMDTAALLFMI GALI P FNVI LDWS KCHS ERSMLYYELEDGL YTAGP YF 492 

| : : : I I I : : I : : I : I : h : I : I I I I 

Db 406 LRVQNNTLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQML 465 

Qy 493 FAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELF LL — HFLLVWLWFCCRTM 544 

I : I M : I : : II II I I I I : : I I : 

Db 466 LAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFL TL 519 

Qy 545 ALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQ 604 

| | : : I : : : I I : I : : I : : I : : I I : 

Db 520 VLLGIVQNPNI-VNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILW 578 



Qy 605 IQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISY 651 

: I I III : I : : I I : I I I : : 
Db 579 NEFYGL NFTCGGSNTSML NHPMCA ITQGVQFIEKTCPGATSRFT 622 

Qy 652 -GFLFLY 657 

I I I I 

Db 623 ANFLILY 629 



RESULT 13 
AAE31702 

ID AAE31702 standard; protein; 652 AA. 
XX 

AC AAE31702; 
XX 

DT 24-MAR-2003 (first entry) 
XX 

DE Mouse ABCG5 protein. 
XX 

KW ABC family cholesterol transporter; ABCG8 ; sterol-related disorder; 

KW sitosterolaemia; hyperlipidaemia; hypercholesterolaemia; gall stone; 

KW HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 

KW mouse; ATP-binding ■ cassette; sitosterolaemia susceptibility gene; SSG; 

KW ABCG5 . 
XX 

OS Mus sp. 
XX 

PN WO200281691-A2. 
XX 

PD 17-OCT-2002. 
XX 

PF 20-NOV-2001; 2001WO-US04 3823 . 
XX 

PR 20-NOV-2000; 2000US-0252235P . 

PR 28-NOV-2000; 2000US-0253645P . 
XX 

PA (TULA-) TULARIK INC. 

PA (TEXA ) UNIV TEXAS SYSTEM. 

XX 

PI Hobbs HH, Shan B, Barnes R, Tian H; 
XX 

DR WPI; 2003-058548/05. 

DR N-PSDB; AAD48880. 
XX 

PT New ABCG8 polypeptides and nucleic acids, useful for treating sterol- 

PT related disorders e.g. sitosterolemia, hypercholesterolemia, 

PT hyperlipidernia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies. 

XX 

PS Claim 28; Page 74; 94pp; English. 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolaemia, gall stones, HDL 



CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is mouse ABCG5 protein 
XX 

SQ Sequence 652 AA; 

Query Match 20.1%; Score 701.5; DB 6; Length 652; 

Best Local Similarity 29.1%; Pred. No. 8.9e-60; 

Matches 194; Conservative 131; Mismatches 245; Indels 97; Gaps 19; 

Qy 24 QDSLFSSESDNS LYFTYSGQSNTLEVRDLTYQVDIASQV- PWFEQLAQFKIPWRSHS 79 

||: :|: :| |: :|| ::::| II II 
Db 27 QGSVTGTEARHSLGVLHVSYS VSNRVGPW WNIKS 60 

Qy 80 SQDSCELGI-RNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQP 138 

I : | : : : I : I I I : : I : II II I : : I I I I : I I | : : : : | | 

Db 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

Qy 139 S T P Q LVRKCVAHVRQH DQ L L PN LT VRET LAF I AQMRLP RT F S QAQ RD KRVED VI AE L RL R 198 

: I : : I I I I : I I I I I I I : I : I I : I : I : I I I : I I I 
Db 121 LRRDQFQDCFSYVLQSDVFLSSLTVl^ETLRYTAMLALCRS-SADFYNKWEAVMTELSLS 179 

Qy 199 QCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRL 258 

I : : I : I : I M II I I I I I I I : I : : : I I I I I : M I I I : : I I : I 

Db 180 HVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVXLLAEL 239 

Qy 259 AKGNRLVLI S LHQPRS DI FRLFDLVLLMT S GT P I YLGAAQQMVQYFT S I GHPCPRYSNPA 318 

I : : I : I : : : : I I I I I : : I : I I : : : I I : : I : : I : : I : I : I I I : I I I 
Db 240 ARRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPF 299 

Qy 319 DFWDLTSIDRRSKEREVATVT;KAQSLAALFLEKVQGFDDFLWK7^EAKELNTSTHTVSLT 378 

t I I : I I I I : I : I : I I I : I — I I II I I : : — =1 

Db 300 DFYMDLTSVDTQSREREIETYKRVQMLECAFKE SDI YHKI -LENI ERARYLKTLP 353 

Qy 379 L TQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGF — 432 

: I : I III: I : I I I I : : : : : : I I : I 

Db 354 MVPFKTKDP P GMFGKLGVL L RRVT RN LMRN KQ AVI MRL VQN L I MGL FL I F YL 4 05 

Qy 433 LYYGHGAKQLS FMDTAALLFMI GALI PFNVI LDWSKCHSERSMLYYELEDGLYTAGP YF 492 

I : : : I I I : : I : : I : I : I > = I : I I I I 

Db 406 LRVQNNT L KGAVQ D RVG LL YQLVGAT P YT GMLNAVN L F PML RAVS DQE S Q D GL YHKWQML 4 65 

Qy 493 FAKI LGELPEHCAYVI I YAMP I YWLTNLRPVPELF LL— HFLLVWLWFCCRTM 544 

I : I I I : I :: II II I ! j I : : I I : 

Db 4 66 LAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFL TL 519 

Qy 545 ALAASAMLPTFHMS SFFCNALYNS FYLTAGFMINLDNLWIVPAWI SKLS FLRWCFSGLMQ 604 

| | ::| : : :||: |: : I : :| ::| I: 

Db 520 VLLGIVQNPNI-VNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILW 578 

Qy 605 IQFNGHLYTTQIGNFTFSILGDTMI SAMDLNSHPLYAIYLIVIGISY 651 

: I I III : I : : I I : I I I : : 
Db 579 NEFYGL NFTCGGSNTSML NHPMCA ITQGVQFIEKTCPGATSRFT 622 

Qy 652 -GFLFLY 657 

I I I I 

Db 623 ANFLILY 629 



RESULT 14 
AAE13309 

ID AAE13309 standard; protein; 652 AA. 
XX 

AC AAE13309; 
XX 

DT 12-FEB-2002 (first entry) 
XX 

DE Mouse sitosterolaemia susceptibility gene (SSG) protein variant #2. 
XX 

KW Mouse; sitosterolaemia susceptibility gene; SSG; atherosclerosis; mutein; 

KW sterol-related disorder; hyperlipidaemia; hypercholesterolaemia; mutant; 

KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 

KW xanthoma; haemolytic anaemia; transgenic animal; therapy; variant. 
XX 

OS Mus sp. 

OS Synthetic. 
XX 

FH Key Location/Qualifiers 

FT Misc-dif f erence 28 

FT /note= "Wild type Gly substituted with Ala" 
XX 

PN WO200179272-A2. 
XX 

PD 25-OCT-2001. 
XX 

PF 18-APR-2001; 2001WO-US012758 . 
XX 

PR 18-APR-2000; 2000US-0198465P . 

PR 15-MAY-2000; 2000US-0204234P . 
XX 

PA (TULA-) TULARIK INC. 
XX 

PI Tian H, Schultz J, Shan B; 

XX \ 

DR WPI; 2002-017598/02. 

XX 

PT Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 

PT useful for screening a compound that increases the level of expression or 

PT activity of SSG polypeptide for treating sterol-related disorder. 
XX 

PS Disclosure; Page; 105pp; English. 
XX 

CC The invention relates to an isolated Sitosterolaemia Susceptibility Gene 

CC (SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 

CC binding cassette (ABC) family cholesterol transporter. SSG is useful for 

CC identifying a compound useful in the treatment or prevention of a sterol- 

CC related disorder, including sitosterolaemia, hyperlipidaemia, 

CC hypercholesterolaemia, gall stones, HDL deficiency, atherosclerosis or 

CC nutritional deficiencies. SSG is also useful for treating cholesterol- 

CC associated diseases or conditions including coronary heart disease and 

CC other cardiovascular diseases, and sitosterolaemia-associated condition 

CC including arthritis, xanthomas and chronic haemolytic anaemia. SSG 

CC expression cassette is useful in the production of transgenic non-human 

CC animals. SSG genes and their homologues are useful as tools for a number 

CC of applications including diagnosing sitosterolaemia and other 



CC cardiovascular disorders, for forensics and paternity determinations, and 

CC for treating any of a large number of SSG associated diseases. The 

CC present sequence is mouse SSG protein variant obtained by replacing Gly2 8 

CC with Ala. Note: The present sequence is not shown in the specification 

CC but is derived from mouse SSG protein referred as SEQ ID NO: 1 (AAE13289) 

CC and shown in figure 7 of the specification 

XX 

SQ Sequence 652 AA; 

Query Match 20.1%; Score 701; DB 5; Length 652; 

Best Local Similarity 29.7%; Pred. No. 9.9e-60; 

Matches 191; Conservative 128; Mismatches 244; Indels 80; Gaps 18; 

Qy 45 NTLEVRDLTYQVDIASQV-PWFEQLAQFKIPWRSHSSQDSCELGI-RNLSFKVRSGQMLA 102 

: : I I : : | I : : : I I I I II : I : : : I : I I I : : 

Db 37 HSLGVLHVSYSV— SNRVGPW WNIKSCQQKWDRQILKDVSLYIESGQIMC 84 

Qy 103 IIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLT 162 

I : I I I I I I I I I : I I | : : : : I I : I : : I I I I : I I 

Db 85 ILGSSGSGKTTLLDAISGRLRRTGTLEGEVFWGCELRRDQFQDCFSYVLQSDVFLSSLT 144 

Qy 163 VRET LAF I AQMRL P RT F S QAQ RD KRVE D VI AE LRL RQ CANT RVGNT YVRGVS GGE RRRVS 222 

| I I I I : I : I I : I : I : I I I : I I I I : : I : hi I I I I I II 

Db 145 VT^ETLRYTAMIALCRS-SADFYNKKVEAVMTELSLSHVADQMIGSYNFGGISSGERRRVS 203 

Qy 223 IGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDIFRLFDL 282 

I | | | : I : : : I I I I I : I I I I I : : I | : | | : : I : I : : : : I I I I I : : I : I I 
Db 2 04 IAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAEIARRDRIVIWIHQPRSELFQHFDK 263 

Qy 283 VLLMT S GT P I YLGAAQQMVQ YFT S I GH P C P RYSN PAD F YVDLT S I DRRS KEREVATVEKA 342 

: : : I I : : I : : I : : I : I : I I I : I I I I I I : I I I I : I : I : I I I : I = : 
Db 264 IAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRV 323 

Qy 343 QSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLTL TQDTDCGTAVELPGMIEQF 398 

I I I I I I : : : : : | : |:| III : 

Db 324 QMLECAFKE SDIYHKI-LENIERARYLKTLPMVPFKTKDP PGMFGKL 369 

Qy 399 STLIRRQISNDFRDLPTLLIHGSEACLMSLIIGF — LYYGHGAKQLSFMDTAALLFMIGA 456 

|:|| ||: ::: : :| I : I I : : : I ||: : 
Db 370 GVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDRVGLLYQLVG 429 

Qy 457 LIPFNVILDWSKCHSERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYW 516 

I: :|: I: |:: I : I I I I I :| II :|:: I I 

Db 430 AT P YTGMLNAVNLFPMLRAVS DQESQDGL YHKWQMLLAYVLHVLP FS VI ATVI FS S VC YW 489 

Qy 517 LTNLRPVPELF LL — HFLLWL VVFCCRTMALAASAMLPTFHMS S FFCNALYNS 568 

II I | | | : : | I : I I :: I : 

Db 490 TLGLYPEVARFGYFSAALLAPHLI GEFL TLVLLGIVQNPNI-VNSIVALLSISG 542 

Qy 569 FYLTAGFMINLDNLWI VPAWI SKLS FLRWCFSGLMQIQFNGHLYTTQI GNFTFS I LGDTM 628 

: : I I : I : : I : : I : : I I : : I I II I : I 

Db 543 LLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGL NFTCGGSNTSM 595 

Qy 629 ISAMDLNSHPLYAIYLIVIGISY GFLFLY 657 

: : I I : I I I : : I I I I 

Db 596 L NHPMCA ITQGVQFIEKTCPGATSRFTANFLILY 629 



RESULT 15 
AAU96993 

ID AAU96993 standard; protein; 651 AA. 
XX 

AC AAU96993; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Human ABCG5 mutant R419P protein sequence. 
XX 

KW Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hyper sterolemia; Alzheimer's disease; 

KW mutant; mutein. 
XX 

OS Homo sapiens. 

OS Synthetic. 
XX 

FH Key Location/Qualifiers 

FT Misc-dif f erence 419 

FT /note= "Wild-type Arg substituted by Pro" 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Claim 10; Page; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 



CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypers terolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the human ABCG5 mutant R419P protein of the 

CC invention. Note: This sequence is not shown in the specification but is 

CC derived from the wild-type human ABCG5 protein (AAU96984) given on pages 

CC 35-36 of the specification 

XX 

SQ Sequence 651 AA; 

Query Match 19.9%; Score 697; DB 5; Length 651; 

Best Local Similarity 29.1%; Pred. No. 2.5e-59; 

Matches 195; Conservative 129; Mismatches 263; Indels 84; Gaps 18; 

Qy 17 LQDASGLQDSL FSSESDNSLYFTYSGQSNTLEVRDLTYQVDIASQVPWFEQLAQFK 72 

II I I I I :: :l I :: I : I I I I : : • : 

Db 15 LQ VN RG S Q S S L EGAP AT AP E P H S L G I LHAS Y S VS H RVR PWWD-ITSCR 61 

Qy 73 IPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKMKSGQ 131 

| : : : : I I I I I : : I : I I M I : : I I I : : I I I I I : 

Db 62 QQWTRQI LKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTF-LGE 112 

Qy 132 IWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDV 191 

:::||: : : I : : I I I I I : I I I I I I I : I : : I : I : i I I 

Db 113 VYVNGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI-RRGNPGSFQKKVEAV 171 

Qy 192 IAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNL 251 

: | | | I I : : I I : I : I I I I I I I I I I M : I : : : I I M : I I I 11= : 
Db 172 MAELSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQI 231 

Qy 252 VTT L S RLAKGNRLVL I S LHQP RS D I FRL FDLVLLMT S GT P I YLGAAQQMVQ YFT S I GH P C 311 

I I | | : | | : I : : : : I I I I I : : I : I I I : : : : I I : I : I : : I I : I I 
Db 232 VVLLVEIARRNRI VVLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPC 291 

Qy 312 PRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTS 371 

I : I I I I I I : I M I : I : I I I I I : I : : I = : : = = I : 

Db 2 92 PEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSA ICHKTLKNIERM 345 

Qy 372 THTVSLTL TQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMS 427 

I : I : I : I : I I : : I : I I I I : : : : : I 

Db 346 KHLKTLPMVPFKTKDS PGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMG 397 

Qy 428 LI I GFLYYGHGAKQL — SFMDTAALLFMI GALI PFNVI LDWS KCHSERSMLYYELEDGL 485 

| : I : | : | I I : I : : I : I : I :: I : I I I 

Db 398 LFLLFFVLRVRSNVLKGAIQDPVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGL 457 

Qy 486 YTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELF LL — HFLLVWLV 537 

| I I II : I : : II II I | | : : I 

Db 458 YQKWQMMIAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFL- 516 

Qy 538 VFCCRTMALAAS7\MLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRW 597 

|:| I ::| : :||: |: : | II :| :: 

Db 517 TLVLLGIVQNPNI-VNSWALLSIAGVXVGSGFLRNIQEMPIPFKIISYFTFQKY 570 



Qy 598 CFSGLMQIQFNGHLYTTQIGNFTFSILGDTM 1 SAMDLNSHPLY 640 

| I : : I I : I I : I : : I I : I II 

Db 571 CSEILWNEFYGLNFT— CGSSNVSVTTNPMCAFTQGIQFIEKTCPGATSRFTMNFLILY 628 

Qy 641 AIY— LIVIGI 649 

: |:::|| 
Db 629 SFIPALVILGI 639 



Search completed: February 27, 2004, 06:44:19 
Job time : 49.9637 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search, using sw model 
Run on: 



February 27, 2004, 07:11:48 ; Search time 15.2266 Seconds 

(without alignments) 
2278.426 Million cell updates/sec 



Title: US-09-989-981A-4 

Perfect score: 3494 
Sequence : 

Scoring table: 



1 MAEKTKEETQLWNGTVLQDA FLFLYYLSLKLIKQKSIQDW 672 

BLOSUM62 
Gapop 10.0 , Gapext 0.5 



389414 



Searched: 389414 seqs, 51625971 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 

Database : Issued_Patents_AA: * 

1: /cgn2_6/ptodata/2/iaa/5A_COMB.pep:* 

2 : /cgn2_6/ptodata/2/iaa/5B_COMB.pep: * 

3: /cgn2_6/ptodata/2/iaa/6A_COMB.pep: * 

4 : /cgn2_6/ptodata/2/iaa/6B_COMB.pep: * 

5: /cgn2_6/ptodata/2/iaa/PCTUS_COMB.pep: * 

6: /cgn2_6/ptodata/2/iaa/backfilesl .pep: * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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ALIGNMENTS 



RESULT 1 
US-09-245-808-1 

; Sequence 1, Application US/09245808 

; Patent No. 6313277 

; GENERAL INFORMATION: 

; APPLICANT: Doyle, L. Austin 

; APPLICANT: Abruzzo, Lynne V. 

; APPLICANT: Ross, Douglas D. 

; TITLE OF INVENTION: Breast Cancer Resistance Protein (BCRP) and DNA which 

TITLE OF INVENTION: encodes it 
; FILE REFERENCE: Ross UMb conversion 
; CURRENT APPLICATION NUMBER: US/09/245,808 
; CURRENT FILING DATE: 1999-02-05 
; EARLIER APPLICATION NUMBER: 60/073763 
; EARLIER FILING DATE: 1998-02-05 
; NUMBER OF SEQ ID NOS : 7 
; SOFTWARE: Patentln Ver. 2.0 
; SEQ ID NO 1 



LENGTH: 655 
TYPE: PRT 

; ORGANISM: Human MCF-7/AdrVp cells 
US-09-245-808-1 

Query Match 18.8%; Score 657.5; DB 4; Length 655; 

Best Local Similarity 27.2%; Pred. No. 2.8e-60; 

Matches 185; Conservative 141; Mismatches 270; Indels 85; Gaps 21; 

Qy 28 FSSESDNSL-YFTYSGQSNTLEVRDLTYQVDIASQVPWFEQLAQFKIPWRSHSSQDSCEL 8 6 

I : : I I I I : I :: |:| : I :| I :: 
Db 20 FPATASNDLKAFT E GAVL S FHN I C YRVKL K S G F LPCRKPVEKEI 63 

Qy 87 GIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRK 14 6 

: I : : : : I : I I : I : I I : : I I I I I : I : I I : I I I I I 

Db 64 -LSNINGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSGL-SGDVLINGAPRPANF--K 118 

Qy 147 C-VAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRV 205 

I :| I I :: Mill I I I :|| I : ::::|: || || I : |:::| 
Db 119 CNSGYWQDDVVMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELGLDKVADSKV 17 8 

Qy 206 GNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLV 265 

I ::|| II I II I :| I I I :| II I I I I I :l I I I I I : I |::| I : 

Db 179 GTQFIRGVSGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTI 238 

Qy 266 LISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLT 325 

: |:IM ||:||| : I: 'II : : I I I : : I I I I : I |:|||||::|: 
Db 239 IFSIHQPRYSIFKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDII 298 

Qy 326 SIDRRS KEREVATVEK AQSLAALFLEKVQGFDDFL — WKAEAKELN 3 69 

: I : II: I : II ::: I III :|: 

Db 299 NGDSTAVALNREEDFKATEIIEPSKQDKPLIEKLAEIYVN SSFYKETKAELHQLS 353 

Qy 370 TSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLI 429 

: | : : : | : I : : I I : : : : I : 

Db 354 GGEKKKKITVFKEI SYTTS FCHQLRWVSKRSFKNLLGNPQASIAQIIVTWLGLV 408 

Qy 430 IGFLYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHS ERSMLYY 479 

I I : I : I : | : | | : :: I I I : : : 

Db 409 IGAIYFGLKNDSTGIQNRAGVLFFL TTNQCFSSVSAVELFWEKKLFIH 457 

Qy 480 ELEDGLYTAGPYFFAKILGE-LPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLW 538 

I || II I : I : I I II: : I : : hi : I : : : I 

Db 458 EYISGYYRVSSYFLGKLLSDLLPMTMLPSIIFTCIVYFMLGLKPKADAFFVMMFTIMVtVA 517 

Qy 539 FCCRTMALAASAMLPTFHMS S FFCNALYNS FYLTAGFMINLDNL — WIVPAWI SKLS FLR 596 

: :|||[ :| ::: : : : I : : I I : I : : I : II 

Db 518 YSAS SMALAI AAGQS WS VATLLMT I CFVFMMI FS GLLVNLTT I ASWL — SWLQ YFS I PR 575 

Qy 597 WCFSGLMQIQFNGHLYTTQIG NFTFSILGDTMI — SAMDLNSHPLYAIYLIVI 647 

: I : I : I I : : | : : | : : : I I : I : : : : 

Db 576 YGFTALQHNEFLGQNFCPGLNATGNNPCNYA-TCTGEEYLVKQGIDLSPWGLWKNHVALA 634 

Qy 64 8 GISYGFLFLYYLSLKLIKQKS 668 

: I I : I I I : I : I 
Db 635 CMIVI FLTIAYLKLLFLKKYS 655 



RESULT 2 
US-09-767-594-1 

Sequence 1, Application US/09767594 
Patent No. 6521635 
GENERAL INFORMATION: 
APPLICANT: Bates, Susan 
APPLICANT: Robey, Robert 

APPLICANT: The Government of the United States of America 
APPLICANT: as represented by the Secretary of the 
APPLICANT: Department of Health and Human Services 

TITLE OF INVENTION: Inhibition of MXR Transport by Acridine Derivatives 
FILE REFERENCE: 015280-4 02100US 
CURRENT APPLICATION NUMBER: US/09/767,594 
CURRENT FILING DATE: 2001-01-22 
PRIOR APPLICATION NUMBER: US 60/177,410 
PRIOR FILING DATE: 2000-01-20 
NUMBER OF SEQ ID NOS : 2 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 1 
LENGTH: 655 
TYPE : PRT 

ORGANISM: Homo sapiens 
FEATURE : 

OTHER INFORMATION: human mitoxanthrone resistance (MXR) /BRCP/ABCP 
OTHER INFORMATION: protein 
US-09-767-594-1 

Query Match 18.7%; Score 655; DB 4; Length 655; 

Best Local Similarity 26.8%; Pred. No. 5.2e-60; 

Matches 186; Conservative 138; Mismatches 271; Indels 100; Gaps 20; 



Qy 


32 


Db 


3 


Qy 


73 


Db 


53 


Qy 


133 


Db 


107 


Qy 


192 


Db 


165 


Qy 


252 


Db 


225 


Qy 


312 


Db 


285 



II I :: I : 



M : : : I :: : : I : I I : I : I I : : I I I I I : I : I I 



M I I II : I I I I I I I I I I I : I I I : = ::: I = I 

iINGAPRPANF — KCNSGYWQDDWMGTLTVRENLQFSAALRLATTMTNHEKNERINRV 164 



Mil: |:::H ::|lllllll:l lll::|: :l II 11111 = 1111 II 



I I : : I I : : I : I I I I I I : I I I : I : I I : : I I I : : I I I I : I 



IYSNPADFYVDLTSIDRRS KEREVATVEK AQSLAALFLEKVQGFD 357 

I : I I I I I :: I : : I : I I : I : II : : : 
lYMMPAnFFT.DIINGDSTAVALNREEDFKATEIIEPSKODKPLIEKLAEIYVN S 339 



Qy 



358 DFL — WKAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPT 415 



I III :|: :h :: h I : : I I : 

Db 340 SFYKETKAELHQLSGGEKKKKITVFKEISYTTS FCHQLRWVSKRSFKNLLGNPQA 394 

Qy 416 LLIHGSEACLMSLIIGFLYYGHGAKQLSFMDT7\ALLFMIGALIPFWILDVVSKCHS- — 472 

: :: 1:11 :|:| : I : I I : ::l I 

Db 395 SIAQIIVTWLGLVIGAIYFGLKNDSTGIQNRAGVLFFL TTNQCFSSVS 443 

Qy 473 ERSMLYYELEDGLYTAGPYFFAKILGE-LPEHCAYVIIYAMPIYWLTNLRPVP 524 

I : : : I II II I : I : I I II: : I : : hi 

Db 444 AVELFWEKKLFIHEYI SGYYRVS S YFLGKLLSDLLPMRMLPSI I FTCIVYFMLGLKPKA 503 

Qy 525 ELFLLHFLLVWLVVFCCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNL — 582 

: | : : : I : : I I I I : I : : : : : : I : : I I : 

Db 504 DAFFVMMFTLMMVAYSAS SMAIAIAAGQSVVSVATLLMTICFVFMMIFSGLLVNLTTIAS 563 

Qy 583 WIVPAWISKLSFLRWCFSGLMQIQFNGHLYTTQIG N FT FS I LGDTMI — SAMD 633 

| : : I : I I : I : I : I I : : | : : I : : : I 

Db 564 WL--SWLQYFSIPRYGFTALQHNEFLGQNFCPGLNATGNNPCNYA-TCTGEEYLVKQGID 620 

Qy 634 LNSHPLYAIYLIVIGISYGFLFLYYLSLKLIKQKS 668 

I : I : : : : : I I : I I I : I : I 
Db 621 LSPWGLWKNHVALACMIVIFLTIAYLKLLFLKKYS 655 



RESULT 3 

US-09-614-912-138 

Sequence 138, Application US/09614912 
Patent No. 6677502 
GENERAL INFORMATION: 
APPLICANT: Allen, Steve 
APPLICANT: Rafalski, Antoni 
APPLICANT: Orozco, Buddy 
APPLICANT: Miao, Gou-Hau 
APPLICANT: Famodu, Omolayo O. 
APPLICANT: Lee, Jian Ming 
APPLICANT: Sakai, Hajime 
APPLICANT: Weng, Zude 
APPLICANT: Caimi, Perry G 
APPLICANT: Anderson, Shawn 

TITLE OF INVENTION: Plant Metabolism Genes 
FILE REFERENCE: BB1378 US NA 
CURRENT APPLICATION NUMBER: US/09/ 614 , 912 
CURRENT FILING DATE: 2000-07-12 
PRIOR APPLICATION NUMBER: 60/143,401 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/143,412 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/146,650 
PRIOR FILING DATE: 1999-07-30 
PRIOR APPLICATION NUMBER: 60/170,906 
PRIOR FILING DATE: 1999-12-15 
PRIOR APPLICATION NUMBER: 60/172,959 
PRIOR FILING DATE: 1999-12-21 
PRIOR APPLICATION NUMBER: 60/172,946 
PRIOR FILING DATE: 1999-12-21 
NUMBER OF SEQ ID NOS : 204 
SOFTWARE: Microsoft Office 97 



SEQ ID NO 138 
LENGTH: 617 
TYPE: PRT 
ORGANISM: Zea mays 
US-09-614-912-138 

Query Match 14.4%; Score 503; DB 4; Length 617; 

Best Local Similarity 25.8%; Pred. No. 6e-44; 

Matches 169; Conservative 130; Mismatches 243; Indels 112; Gaps 23; 

Qy 51 DLT YQVDI ASQVPWFEQLAQFKI PWRSHS SQDS CELGI RNLS FKVRS GQMLAI I GS S GCG 110 

: : | ||: : : : : II :| :: I I : |::| II I 

Db 15 NVNYYVDMPAEM KHQGVQDDRLQLLREVTGSFRPGVLTALMGVSGAG 61 

Qy 111 RASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFI 170 

: : I : II : II II :: I I I I I = : hi I : I I I I : I : 

D b 62 KTTLMDVLAGRKTGGYIE-GDIRIAGYPKNQATFARISGYCEQNDIHSPQVTVRESLIYS 120 

Qy 171 AQMRLPRTFSQAQ RDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGV 225 

| : | | | : : : |::|: : I : M : hi I 

D b 121 AFLRLPGKIGDQEITDDIKMQFVDEVMELVELDNLRDALVGLPGITGLSTEQRKRLTIAV 180 

Qy 226 QLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDIFRLFD-LVL 284 

:|: II I: : Ill: I — I : I I: = = ■ I I IN M 1 = 1 

Db 181 ELVANPSIIFMDEPTSGLDARAAAIVMRTVRNTVDTGRTWCTIHQPSIDIFESFDELLL 240 

Qy 2 85 LMTSGTPIYLGA AQQMVQYFTSI-GHP — CPRYSNPADFYVDLTSIDRRSKEREVA 337 

I | Ml : | : | | : | | : | I I :| III : ::::| II 
Db 241 LKRGGQVIYSGKLGRNSQKMVEYFEAIPGVPKIKDKY-NPATWMLEVSS VA 290 

Qy 338 TVEKAQSLAALFLEKVQGFDDFLWK7VEAKELNTSTHTVSLTLTQDTDCGTAVELP 392 

| : : II I : I : I : I : : I 

D b 2 91 TEVRLKM DFAKYYETSDLYKQNKVLVNQLSQPEPGTSDLYFPTEYSQ 337 

Qy 393 GMI EQFSTLI RRQI SNDFRDLPTLLIHGSEACLMSLI IGFLYYGHGAKQLS FMDTAALLF 452 

IN : :| :| |: I |::|::| ::: I : I I 

Db 338 STI GQFKACLWKQWLTYWRS PDYNLVRYS FTLLVALLLGSI FWRI GT NMEDATTLGM 394 

Q y 453 MIGALIPFNVILDV-VSKCHS ERSMLYYELEDGLYTAGPYFFAKI LGELPEHC 504 

: | | | ::::::: I : II:: I I 1-1 = 1 M 1 = :: hi 
Db 395 VI GAM — YT AVM FIGINNCS T VQ P WS I E RT VF Y RE RAAGM Y SAMP Y AI AQWI E I P 449 

Qy 505 AYV IIYAMPIYWLTNLRPVPELFLLHFLLVWLVVFCCRTMALAASAMLPTF 555 

|| I : I I I : I : : I : : I : : : : I : : I 

Db 450 -YVFVQTTYYTLIVYAMMSFQWTAVKFFWFFFISYFSFLYFTYY GMMAVS I S PNH 503 

Q y 556 HMS S FFCNAL YN S FYLTAGFMI NLDNLWIVPAWISKLSFLRWCFSGLMQIQFNGHLY 612 

: : I I I : : I I : I I I : II II h I I h 
Db 504 EVASIFAAAFFSLFNLFSGFFIPRPRIPGWWIWYYWICPLA WTVYGLI 551 

Qy 613 TTQIGNF — TFS I LGDTMI SAMDLNSH PLYAIYLIVIGISYGFLF 655 

|||: |: |:: : :| |: | |:: : : ||: 

Db 552 VTQYGDLEDLISVPGESEQTISYYVTHHFGYHRDFLPVIAPVLVLFAVFFAFLY 605 



RESULT 4 

US-09-614-912-140 



Sequence 140, Application US/09614912 
Patent No. 6677502 
GENERAL INFORMATION: 
APPLICANT: Allen, Steve 
APPLICANT: Rafalski, Antoni 
APPLICANT: Orozco, Buddy 
APPLICANT: Miao, Gou-Hau 
APPLICANT: Famodu, Omolayo O. 
APPLICANT: Lee, Jian Ming 
APPLICANT: Sakai, Hajime 
APPLICANT: Weng, Zude 
APPLICANT: Caimi, Perry G 
APPLICANT: Anderson, Shawn 

TITLE OF INVENTION: Plant Metabolism Genes 
FILE REFERENCE: BB137 8 US NA 
CURRENT APPLICATION NUMBER: US/09/614 , 912 
CURRENT FILING DATE: 2000-07-12 
PRIOR APPLICATION NUMBER: 60/143,401 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/143,412 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/146,650 
PRIOR FILING DATE: 1999-07-30 
PRIOR APPLICATION NUMBER: 60/170,906 
PRIOR FILING DATE: 1999-12-15 
PRIOR APPLICATION NUMBER: 60/172,959 
PRIOR FILING DATE: 1999-12-21 
PRIOR APPLICATION NUMBER: 60/172,946 
PRIOR FILING DATE: 1999-12-21 
NUMBER OF SEQ ID NOS : 204 
SOFTWARE: Microsoft Office 97 
SEQ ID NO 140 
LENGTH: 1296 
TYPE: PRT 

ORGANISM: Oryza sativa 
US-09-614-912-140 

Query Match 13.9%; Score 485; DB 4; Length 1296; 

Best Local Similarity 24.0%; Pred. No. 1.7e-41; 

Matches 177; Conservative 152; Mismatches 271; Indels 138; Gaps 28; 

Qy 1 MAEKTKEETQ LWNGTVLQDASGLQD SLFSSESDNSLYFTYSGQSN 45 

: : | : I : I : : I I : : : : : I : I I : III 

Db 614 ISEETAKEAEGNGDARHTVRNGSTKSNGGNHKEMREMRLSARLSNSSSNGVSRLMSIGSN 673 

Qy 46 TLEVRDLTYQVDIASQVPWFEQLAQFKIPWRSHSSQDSCELGIRN 90 

:: |: | ||: ::: : I :| :|: 

Db 674 EAG P RRGMVL P FT P L SMS FD DVN Y YVDMP AEMK QQGWDDRLQL-LRD 720 

Qy 91 LSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKCVAH 150 

: : | : | : : | | | | : : | : M : I I I I : : I : I : I I : : : 
Db 721 VTGS FRP AVLTALMGVS GAGKTT LMDVLAGRKT GGY IE- GDMRI S G YP KNQET FARI S GY 77 9 

Qy 151 VRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQ RDKRVEDVIAELRLRQCANTRV 205 

1:1 I : I I I I : I : I : I I I : : : I : : I : : I : I 

Db 780 CEQNDIHSPQVTVRESLIYSAFLRLPEKIGDQEITDDIKIQFVDEVMELVELDNLKDALV 839 



Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



206 GNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLV 265 

I : |:| :|:|::| I : I : I I I : : I I I I I I I I : I :: I: II 
840 GLPGITGLSTEQRKRLTIAVELVANPSIIFMDEPTSGLDARAAAIVMRTVRNTVDTGRTV 899 

266 LISLHQPRSDIFRLFD-LVLLMTSGTPIYLGA AQQMVQYFTSI-GHP — CPRYSNP 317 

: ::||| III II hll I III :|:|::|| :| I I :| II 

900 VCTIHQPS I DI FEAFDELLLLKRGGQVI YSGQLGRNSQKMI EYFEAI PGVPKI KDKY-NP 958 

318 7VDFWDLTSIDRRSKEREVATVEKAQSL7^VLFLEKVQGFDDFLWKAE7^KELNTSTHTVSL 377 

I : ::::|: :|: II : :| : 

959 ATWMLEVSSV AAEVRLNMDFAEYYKTSDLYKQNKVLVN 996 

378 TLTQDTDCGTAVELP GMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGF 432 

I : I : : I III : : I : I I : I : I : : I 

997 QLSQPEPGTSDLHFPTKYSQSTIGQFRACLWKQWLTYWRSPDYNLVRFSFTLFTALLLGT 1056 

433 LYYGHGAKQLSFMDTA-ALLFMIGALIPFNVILDV-VSKCHS ERSMLYYELED 483 

: : : | | I I : I : I I I : : : : : : : I : I I : : I I 

1057 IFWKIGTK MGNAN S LRMVI GAM — YT AVMFI G I NN C AT VQ P I VS I E RT VF YRE RAA 1110 

484 GLYTAGP YFFAKI LGELP EHCAY — VIIYAMPIYWLTNLRPVPELFLLHFLLVWLW 538 

|:|:| II I::: hi II :|:lll : I : l—l " 

1111 GMYSAMPYAIAQWMEIPYVFVQTAYYTLIVYAMMSFQWTAAKFFWFFFVSYFSFLYFTY 1170 

539 FCCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMI NLDNLWIVPAWISKLSFL 595 

: |:|:: I ::: I I |: I I : I I : II h I: 

1171 YGMMTVAIS PNHEVAAI FAAAFYSLFNLFSRFFI PRPRI PKWWIWYYWLCPLA — 1223 

596 RWCFSGLMQIQFNGHLYTTQIGNF — TFSILGDTMISAMDLNSH PLYAIYL 644 

I II: III: | : | : : : I I : I I 

1224 -WTVYGLI VTQYGDLEQIISVPGQSNQTISYYVTHHFGYHRKFMPWAPVL 1273 

645 IVIGISYGFLFLYYLSLK 662 
:: : I |:| : :| 
1274 VLFAVF— FAFMYAICIK 128 9 



RESULT 5 

US-09-614-912-144 

Sequence 144, Application US/09614912 
Patent No. 6677502 
GENERAL INFORMATION: 
APPLICANT: Allen, Steve 
APPLICANT: Rafalski, Antoni 
APPLICANT: Orozco, Buddy 
APPLICANT: Miao, Gou-Hau 
APPLICANT: Famodu, Omolayo O. 
APPLICANT: Lee, Jian Ming 
APPLICANT: Sakai, Hajime 
APPLICANT: Weng, Zude 
APPLICANT: Caimi, Perry G 
APPLICANT: Anderson, Shawn 

TITLE OF INVENTION: Plant Metabolism Genes 
FILE REFERENCE: BB137 8 US NA 
CURRENT APPLICATION NUMBER: US/09/614 , 912 
CURRENT FILING DATE: 2000-07-12 
PRIOR APPLICATION NUMBER: 60/143,401 



; PRIOR FILING DATE: 1999-07-12 

; PRIOR APPLICATION NUMBER: 60/143,412 

; PRIOR FILING DATE: 1999-07-12 

PRIOR APPLICATION NUMBER: 60/146,650 
; PRIOR FILING DATE: 1999-07-30 
; PRIOR APPLICATION NUMBER: 60/170,906 
; PRIOR FILING DATE: 1999-12-15 
; PRIOR APPLICATION NUMBER: 60/172,959 
; PRIOR FILING DATE: 1999-12-21 
; PRIOR APPLICATION NUMBER: 60/172,946 
; PRIOR FILING DATE: 1999-12-21 
; NUMBER OF SEQ ID NOS : 204 
; SOFTWARE: Microsoft Office 97 
; SEQ ID NO 144 

LENGTH: 539 

TYPE: PRT 
; ORGANISM: Triticum aestivum 

FEATURE : 

NAME/ KEY: UNSURE 
; LOCATION: (272).. (273) 
US-09-614-912-144 

Query Match 11.9%; Score 415; DB 4; Length 539; 

Best Local Similarity 25.4%; Pred. No. le-34; 

Matches 130; Conservative 102; Mismatches 216; Indels 64; Gaps 13; 

Qy 124 GGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQ 183 

I I : : I : I : : I I : : : I : I I : : I : I : I I I : I I I : 
Db 1 GGYIE-GEITVSGYPKKQETFARISGYCEQNDIHSPHVTIYESLVFSAWLRLPAEVDSER 59 

Qy 184 RDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGL 243 

| : | : : : : | I I I I : I : I : I : : I I : I : I I I : : I I I I I I I 

Db 60 RKMFIEEIMDLVELTSLRGALVGLPGVNGLSTEQRKRLTIAVELVANPSIIFMDEPTSGL 119 

Qy 244 D S FTAHNLVTT L S RLAKGNRLVL I S LHQ P RS D I FRL FD- LVLLMT S GT P I YLGAAQQ 299 

I : I : : I : I I : : : I I I III Mil: I I I : I I 

Db 120 DARAAAI VMRT VRNT VNT GRT WCT I HQP S I D I FEAFDEL FLMKRGGEEI YVG PVGQN S A 17 9 

Qy 300 -MVQYFTSI GHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQ — SLAAL 348 

: : : I I I I : I I I : : : : : I I : I . : : : I : : I 

Db 180 NLIEYFEEIEGISKIKDGY NPATWMLEVSS SAQEEMLGIDFAEVYRQSEL 229 

Qy 349 FLEKVQGFDDFLWKAE-AKELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQIS 407 

: : : I : : I I I : I I I : : | : | 

Db 230 YQRNKELIKELSMPAPGSSDLNFPTQYSRSFVTQCLAC LWKQXXSYWRNPSY 281 

Qy 408 NDFRDLPTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDT AALLFM IG 455 

I I I : : I :|: | ::: |:| I ||:|:: I 

Db 2 82 TAVRLLFTIVI ALMFGTMFWDLGSKTRRSQDLFNAMGSMYAAVLYIGVQNSG 333 

Qy 456 ALIPFNVILDWSKCHSERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIY 515 

: : | | : I I : : I I I : I : I I I I : : II : I I : I 

Db 334 SVQPVVW ERTVFYRERAAGMYSAFP YAFGQVAI EFP YVLVQALI YGGLVY 384 

Qy 516 WLTNLRPVPELFLLHFLLVWLWFCCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGF 575 

: ||: : : : : I : I : : : : I I I : I : I : 

Db 385 SMIGFEWTVAKFLWYLFFMYFTMLYFTFYGMMAVGLTPNESIAAIISSAFYNVWNLFSGY 444 



Qy 576 MINLDNLWIVPAWISKLSFLRWCFSGLMQIQF 607 

: I II | | : : I I I : I I 
Db 445 LIPRPKLPIWWRWYSWICPVAWTLYGLVASQF 476 



RESULT 6 

US-08-665-259-25 

Sequence 25, Application US/08665259 
Patent No. 6028173 
GENERAL INFORMATION: 

APPLICANT: Landes, Gregory M. 
APPLICANT: Burn, Timothy C. 
APPLICANT: Connors, Timothy D. 
APPLICANT: Dackowski, William R. 
APPLICANT: Van Raay, Terence J. 
APPLICANT: Klinger, Katherine W. 

TITLE OF INVENTION: NOVEL HUMAN CHROMOSOME 16 GENES, 

TITLE OF INVENTION: COMPOSITIONS, METHODS OF MAKING AND USING SAME 
NUMBER OF SEQUENCES: 73 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: GENZYME CORPORATION 
STREET: One Mountain Road 
CITY: Framingham 
STATE : Massachusetts 
COUNTRY: United States of America 
ZIP: 01701 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/ 08/ 665 , 259 
FILING DATE: 17-JUN-1996 
CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION: 
NAME: Dugan, Deborah A. 
REGISTRATION NUMBER: 37,315 
REFERENCE/ DOCKET NUMBER: . IG5-9.1 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (508) 872-8400 
TELEFAX: (508) 872-5415 
INFORMATION FOR SEQ ID NO: 25: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 1684 amino acids 
TYPE: amino acid 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-08-665-259-25 

Query Match 7.2%; Score 253; DB 3; Length 1684; 

Best Local Similarity 26.9%; Pred. No. le-16; 

Matches 90; Conservative 66; Mismatches 126; Indels 52; Gaps 15; 

Qy 8 8 IRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLV — R 145 

:|:|: : ||: : : I : I I : : I : : I I II: :|:| : :l I 



Db 



529 VRDLNLNLYEGQITVLLGHNGAGKTTTLSMLTGL FPPTSGRAYISGYEISQDMVQIR 585 



Qy 146 KCVAHVRQHDQLLPNLTVRETIAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRV 205 

I : I I I I I I I I I I I I I : : I : : : I : : : : I hi 
Db 586 KSLGLCPQHDI LFDNLTVAEHLYFYAQL KGLSRQKCPEEVKQMLHIIGLEDKWNSR- 641 

Qy 206 GNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLV 265 

I : I I I I I : : I I I : I : : I I I I I I I I I : I : : : II I :| : 

Db 642 SRFLSGGMRRKLSIGIALIAGSKVLILDEPTSGMDAISRRAIWDLLQR-QKSDRTI 696 

Qy 266 LIS LH- Q P RS D I FRL FDLVLLMT S GT P I YLGAAQQMVQ YFTSIGHPCPRYSNPA 318 

: : : | : h I I : : I I h : : I : I : I : I I 
Db 697 VLTTHFMDEADL — LGDRIAIMAKGELQCCGSSLFLKQKYGAGYHMTLVKEP HCNPE 751 

Qy 319 DFYVDLTSIDRRSKEREV — ATVEKAQSLAALFL EKVQGFDDFLWKAE — AKELNTS 371 

| : I I I : I : hi h Mill: 

Db 752 DI SQLVHHHVPNATLES SAGAELS FI LPRESTHRFEGLFAKLEKKQKELGIA 803 

Qy 372 THTVSLTLTQD TDCG TAVELPGM 394 

: |:|:: | I : : I | : 

Db 804 S FGAS I T TME EVFL RVGK LVD S SMD I QAI Q L PAL 837 



RESULT 7 

US-08-762-500-25 

Sequence 25, Application US/08762500 
Patent No. 6030806 
GENERAL INFORMATION: 

APPLICANT: Landes, Gregory M. 
APPLICANT: Burn, Timothy C. 
APPLICANT: Connors, Timothy D. 
APPLICANT: Dackowski, William R. 
APPLICANT: Van Raay, Terence J. 
APPLICANT: Klinger, Katherine W. 

TITLE OF INVENTION: NOVEL HUMAN CHROMOSOME 16 GENES, 

TITLE OF INVENTION: COMPOSITIONS, METHODS OF MAKING AND USING SAME 
NUMBER OF SEQUENCES: 83 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: GENZYME CORPORATION 
STREET: One Mountain Road 
CITY: Framingham 
STATE : Massachusetts 
COUNTRY: United States of America 
ZIP: 01701 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/762 , 500 
FILING DATE: 09-DEC-1996 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/665,259 
FILING DATE: 17-JUN-1996 
PRIOR APPLICATION DATA: 



APPLICATION NUMBER: PCT/US96/ 104 69 

FILING DATE: 17-JUN-1996 
ATTORNEY/AGENT INFORMATION: 

NAME: Dugan, Deborah A. 
; REGISTRATION NUMBER: 37,315 

; REFERENCE/DOCKET NUMBER: IG5-9.3 

TELECOMMUNICATION INFORMATION: 

TELEPHONE: (508) 872-8400 

TELEFAX: (508) 872-5415 
; INFORMATION FOR SEQ ID NO: 25: 
; SEQUENCE CHARACTERISTICS: 

LENGTH: 1684 amino acids 
; TYPE: amino acid 

TOPOLOGY: linear 
; MOLECULE TYPE: protein 
US-08-762-500-25 



Query Match 7.2%; Score 253; DB 3; Length 1684; 

Best Local Similarity 26.9%; Pred. No. le-16; 

Matches 90; Conservative 66; Mismatches 126; Indels 52; Gaps 15; 

Qy 88 IRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLV— R 145 

:|:|: : ||: : : I : i I : : I : : I I II: : :l I 

Db 529 VRDLNLNLYEGQITVLLGHNGAGKTTTLSMLTGL FPPTSGRAYISGYEISQDMVQIR 585 

Qy 146 KCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRV 2 05 

I : I I I I I I I I I I I I I : : I : : : I : :: : I hi 

Db 586 KSLGLCPQHDILFDNLTVAEHLYFYAQL— KGLSRQKCPEEVKQMLHIIGLEDKWNSR- 641 

Qy 206 GNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLV 265 

I : I I I I I: : I I I: h : I I I I I I I I h h : = I I I : I : 

Db 642 SRFLSGGMRRKLSIGIALIAGSKVLILDEPTSGMDAISRRAIWDLLQR-QKSDRTI 696 

Qy 266 LISLH-QPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQ YFTS I GHPCPRYSNPA 318 

::: | :|: I I : :| I |:: : I : I : I : I I 
Db 697 VLTTHFMDEADL--LGDRIAIMAKGELQCCGSSLFLKQKYGAGYHMTLVKEP HCNPE 751 

Qy 319 DFYVDLT S I DRRS KEREV — ATVEKAQS LAALFL EKVQGFDDFLWKAE — AKELNTS 371 

| : I MM : hi h I I Ml : 

Db 752 DI SQLVHHHVPNATLESSAGAELSFILPRESTHRFEGLFAKLEKKQKELGIA 803 

Qy 372 THTVSLTLTQD TDCG TAVELPGM 394 

: I : I : : I h : I I : 

Db 8 04 S FGAS I TTMEEVFL RVGKLVD S SMD I QAI QL PAL 837 



RESULT 8 

US-08-762-500-75 

Sequence 75, Application US/08762500 
Patent No. 6030806 
GENERAL INFORMATION: 

APPLICANT: Landes , Gregory M. 
APPLICANT: Burn, Timothy C. 
APPLICANT: Connors, Timothy D. 
APPLICANT: Dackowski, William R. 
APPLICANT: Van Raay, Terence J. 
APPLICANT: Klinger, Katharine W. 



TITLE OF INVENTION: NOVEL HUMAN CHROMOSOME 16 GENES, 

TITLE OF INVENTION: COMPOSITIONS, METHODS OF MAKING AND USING SAME 
NUMBER OF SEQUENCES : 83 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: GENZYME CORPORATION 
STREET: One Mountain Road 
CITY: Framingham 
STATE: Massachusetts 
COUNTRY: United States of America 
ZIP: 01701 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/ 08/762 , 500 
FILING DATE: 09-DEC-1996 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/665,259 
FILING DATE: 17-JUN-1996 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: PCT/US96/10469 
FILING DATE: 17-JUN-1996 
ATTORNEY/AGENT INFORMATION: 
NAME: Dugan, Deborah A. 
REGISTRATION NUMBER: 37,315 
REFERENCE/ DOCKET NUMBER: IG5-9.3 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (508) 872-8400 
TELEFAX: (508) 872-5415 
INFORMATION FOR SEQ ID NO: 75: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 1704 amino acids 
TYPE: amino acid 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-08-762-500-75 

Query Match 7.2%; Score 253; DB 3; Length 1704; 

Best Local Similarity 26.9%; . Pred. No. l.le-16; 

Matches 90; Conservative 66; Mismatches 126; Indels 52; Gaps 15; 

Qy 8 8 IRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLV — R 145 

: I : I : : I I : : : I : I I : : i : : I I I I : : I : I : : I I 

Db 549 VRDLNLNLYEGQITVLLGHNGAGKTTTLSMLTGL FPPTSGRAYISGYEISQDMVQIR 605 

Qy 14 6 KCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRV 205 

I : Ml I I I I I I I I I I : : I : : : I : :: : I hi 
Db 606 KSLGLCPQHDILFDNLTVAEHLYFYAQL KGLSRQKCPEEVKQMLHIIGLEDKWNSR- 661 

Qy 206 GNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLV 265 

| : | | | M : : I I I : I : : I I I I I I I I h h : : II I : I : 

Db 662 S RFLS GGMRRKLS I GI ALI AGS KVLI LDEPT S GMDAI S RRAI WDLLQR-QKS DRT I 716 



Qy 



266 LISLH-QPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQ YFTSIGHPCPRYSNPA 318 



:: : I : | : | | : : I I I : : : I : I : I : I I 
Db 717 VLTTHFMDEADL- - LGDRI AIMAKGELQCCGS SLFLKQKYGAGYHMT LVKEP HCNPE 771 

Qy 319 DFYVDLTSIDRRSKEREV— ATVEKAQSLAALFL EKVQGFDDFLWKAE — AKELNTS 371 

I : | | | : | : I : I I : Mill: 

Db 772 DI SQLVHHHVPNATLESSAG7VELSFILPRESTHRFEGLFAKLEKKQKELGIA 823 

Qy 372 THTVSLTLTQD TDCG TAVELPGM 394 

: I : I : : I I : : I I : 

Db 824 S FGAS I TTMEEVFLRVGKLVD S SMD I QAI QL PAL 857 



RESULT 9 

US-09-540-236-2029 

Sequence 2029, Application US/09540236 
Patent No. 6673910 
GENERAL INFORMATION: 
APPLICANT: Gary L. Breton et al . 

TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
MO RAX EL LA CATARRHAL IS 

TITLE OF INVENTION: FOR DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: 27 09.2005-001 
CURRENT APPLICATION NUMBER: US/ 09/540 , 236 
CURRENT FILING DATE: 2000-04-04 
NUMBER OF SEQ ID NOS : 38 40 
SEQ ID NO 2029 
LENGTH: 360 
TYPE: PRT 

ORGANISM: M. catarrhalis 
US-09-540-236-2029 

Query Match 7.0%; Score 245.5; DB 4; Length 360; 

Best Local Similarity 25.3%; Pred. No. 4.6e-17; 

Matches 98; Conservative 78; Mismatches 136; Indels 75; Gaps 18; 

Qy 64 WFEQLA — QFKIPW RSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDV 117 

|:|| : I : : : I I I M :|: I I :: ::| IIIMM 

Db 19 WIQQFANNEKKMSYIQINNAHKSFGSLTV-IDDLNLNVEKGSLVTLLGPSGCGKSTLLRC 77 

Qy 118 ITGRGHGGKMKSGQIWINGQPST PQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMR 174 

|| : I M II I II : : : II I : II I : I I : M : : 

Db 7 8 IAGL ETLNQGSIILNNQDITYLKPQ — KRRIAMVFQNYALFPNMTVADNVEFGLKI- 131 

Qy 175 L P RT F S QAQ RDKRVE DVI AE LRL RQCANT RVGNT YVRGVS GGE RRRVS I GVQLLWN P G I L 234 

: I :| :|:||: : I I : :|||:::||:: h Ml 

Db 132 — KKVS LEERLI KVKDVLDLVELTS FAQQK PESLSGGQKQRVALARALVMEPDLL 184 

Qy 235 ILDEPTSGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYL 294 

: | | | || I I : : I MM I : : : : M I : I : I 
Db 185 LLDEPLSALDAKLRKSLRMQIKRIQKELGLTTLFWHDQDEALAMSDEVVLLNKG 239 

Qy 295 GAAQQMVQYFT S I GHPCPRYSNPADFYVDLT S I DRRS KEREVATVEKAQS LAALFLEKVQ 354 

: :| M | I M : : I I : 

Db 24 0 KIEQHST PDTLYTQPNNRF TAGFIGHYN 267 

Qy 355 -GFDDFLWKAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMI-EQFST — LIRRQISNDF 410 

| : : : I I : I : I : MM : : I I : II : I : : I I : I : 



Db 268 IGYFESVKSKSAKQLSMMAIRPE-TILLDTDDG— DIPGVILERTLTGGWRYQVRTDY 323 



Qy 411 RDL — PTLLIHGSEA CLMSLII 430 

I : : I I I : I : II I 

Db 32 4 GDI FDVDVLNHGKI SQLKVNCKVFLI I 350 



RESULT 10 

US-09-252-991A-21665 

Sequence 21665, Application US/09252991A 
Patent No. 6551795 
GENERAL INFORMATION: 
APPLICANT: Marc J. Rubenfield et al . 

TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: 107196.136 

CURRENT APPLICATION NUMBER: US/09/252 , 991A 
CURRENT FILING DATE: 1999-02-18 
PRIOR APPLICATION NUMBER: US 60/074,788 
PRIOR FILING DATE: 1998-02-18 
PRIOR APPLICATION NUMBER: US 60/094,190 
PRIOR FILING DATE: 1998-07-27 
NUMBER OF SEQ ID NOS : 33142 
SEQ ID NO 21665 
LENGTH: 593 
TYPE: PRT 

ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-21665 

Query Match 6.7%; Score 235; DB 4; Length 593; 

Best Local Similarity 27.2%; Pred. No. 1.4e-15; 

Matches 68; Conservative 56; Mismatches 100; Indels 26; Gaps 7; 

LS FKVRS GQMLAI I GS S GCGRAS LLDVITGRGHGGKMKS GQIWINGQ PSTPQLVR 145 

: | | : : : : | | I I I : : : I I II : : | | : | I : I : I II : 



I I I : I I : : : I I : : : I h : I I : I : I : I 
lLFPNMTVOONVAFGLRMOKVP AAE LKQ RVAEAI E LVE L GE YA 389 



Qy 


89 


Db 


282 


Qy 


146 


Db 


337 


Qy 


205 


Db 


390 


Qy 


265 


Db 


448 


Qy 


325 


Db 


500 



I I I : : I I : : I : I : I : I I I I I I I : = I : I : 



I I :: I I : I : I I : : : I : I I : 

•LSDRIVLMNAGRIVOSGDAETL YTAPENAFAAGFIGNY 4 99 



RESULT 11 

US-09-134-000C-3584 



Sequence 3584, Application US/09134000C 
Patent No. 6617156 
GENERAL INFORMATION : 
APPLICANT: Lynn Doucette-Stamm et al 

TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
TITLE OF INVENTION: ENTEROCOCCUS FAECAL IS FOR DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: 032796-032 

CURRENT APPLICATION NUMBER: US/ 09/134, 000C 
CURRENT FILING DATE: 1998-08-13 
PRIOR APPLICATION NUMBER: US 60/055,778 
PRIOR FILING DATE: 1997-08-15 
NUMBER OF SEQ ID NOS : 6812 
SOFTWARE : Patentln version 3.1 
SEQ ID NO 3584 
LENGTH: 229 
TYPE: PRT 

ORGANISM: Enterococcus faecalis 
US-09-134-000C-3584 

Query Match 6.7%; Score 234; DB 4; Length 229; 

Best Local Similarity 27.0%; Pred. No. 3.5e-16; 

Matches 68; Conservative 58; Mismatches 88; Indels 38; Gaps 8; 

LEVRDLTYQVDI ASQVPWFEQLAQFKI PWRSHS SQDSCELGI RNLS FKVRSGQMLAI I GS 106 
IMM: : : | :: : : : : : : I I I I : I : : I : I I 

LEVRDMA NVLEMKNIYKKYGEKHTEVIALKELSFAVQPGEFVAVIGP 49 

SGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRK CVAHVRQH DQ L L 158 

|| |::: I : I ||:: : II I :| :| : : I h 



||||: I : : I I : : I : : I I : : : I : I I : I I I I I 

PFLTVEDQFHLIEKVDKSRKNSELK EQLLETLGLKE LRNSYPRDLSGGER 155 



Ml I I : : : II I I : II : I : : I I : : I I : : : : : I I 



Qy 


47 


Db 


3 


Qy 


107 


Db 


50 


Qy 


159 


Db 


106 


Qy 


219 


Db 


156 


Qy 


278 


Db 


214 



RESULT 12 

US-09-4 8 9-039A-10393 

; Sequence 10393, Application US/09489039A 

; Patent No. 6610836 

; GENERAL INFORMATION: 

; APPLICANT: Gary Breton et. al 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
KLEBSIELLA 

; TITLE OF INVENTION: PNEUMONIAE FOR DIAGNOSTICS AND THERAPEUTICS 

; FILE REFERENCE: 2709.2004001 

; CURRENT APPLICATION NUMBER: US/09/489, 039A 

; CURRENT FILING DATE: 2000-01-27 

; PRIOR APPLICATION NUMBER: US 60/117,747 



PRIOR FILING DATE: 1999-01-29 
NUMBER OF SEQ ID NOS : 14342 
SEQ ID NO 10393 
LENGTH: 2 65 
TYPE: PRT 

ORGANISM: Klebsiella pneumoniae 
US-09-489-039A-10393 

Query Match 6.7%; Score 233.5; DB 4; Length 265; 

Best Local Similarity 27.9%; Pred. No. 5.1e-16; 

Matches 64; Conservative 51; Mismatches 97; Indels 17; Gaps 4; 

Qy 86 LGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVR 145 

| : : I : I I : I : : : : I I I I I I : : : I I : : I I : I : I : : I I 

Db 25 LALQNVSFDIVEGETISLIGHSGCGKSTLLNLIA— GITTPTEGGLLCDNREIAGPGPER 82 

Qy 146 KCVAHVRQHDQLLPNLTVRETIAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQC7\NTRV 205 

|||: Ml |: : :| II |:::| : :| :| ::: : I 

Db 83 AWFQNHSLLPWLSCFDNV7VLAVDQVFRRTMSKSERREWIEHNLARVQMGHALHKRP 139 

Qy 206 GNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLV 265 

| : I I I : : I I I I I : I : I I I I I I : I :l I: : : 

Db 140 GE ISGGMKQRVGIARALAMKPKVLLLDEPFGALDALTRAHLQDTVMHIQQELNTT 194 

Qy 266 LISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRY 314 

: : : : I I I I : I I : I : I : : I I I : 

Db 195 I VMI T H DVD EAVL LS DRVLMMTN GPAAT VGE ILAVDLPRPRH 236 



RESULT 13 

US-09-252-991A-2 0719 

; Sequence 20719, Application US/09252991A 
; Patent No. 6551795 
; GENERAL INFORMATION: 

APPLICANT: Marc J. Rubenfield et al . 
; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

; TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE: 107196.136 

; CURRENT APPLICATION NUMBER: US/09/252 , 991A 

; CURRENT FILING DATE: 1999-02-18 

; PRIOR APPLICATION NUMBER: US 60/074,788 

; PRIOR FILING DATE: 1998-02-18 

; PRIOR APPLICATION NUMBER: US 60/094,190 

PRIOR FILING DATE: 1998-07-27 
; NUMBER OF SEQ ID NOS: 33142 
; SEQ ID NO 20719 

LENGTH: 370 
; TYPE: PRT 

ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-20719 

Query Match 6.5%; Score 227; DB 4;' Length 370; 

Best Local Similarity 26.2%; Pred. No. 4.4e-15; 

Matches 90; Conservative 57; Mismatches 152; Indels 44; Gaps 



88 IRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKC 147 



: 1:| : :|: : : I I I || : : I I : : I : 11-1 =111 I 

Db 28 VDNVSLTINTGEFFTLLGPSGCGKTTLLRMLAG FDQPDSGEIRLNGQDLAGVEPEKR 84 

Qy 148 VAH-VRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRVG 2 06 

III | |:::| : :|| :| :::: 1 III : ::|| I 
Db 85 PVHTVFQSYALFPHMSVAQNIAFPLKM AGVAKS E I DARVEQAL KDVRLAD KG 136 

Qy 2 07 NTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLVL 2 66 

: I I I : I : I I : I I : I : I : I I I I I I I : ' I I I : 

Db 137 GRMPTQLSGGQRQRVAIARALVNRPRLLLLDEPLSALDAKLREEMQIELINLQKDVGITF 196 

Qy 267 ISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLTS 326 

: : : : I : : I I | | : : : II III 

Db 197 VYVTHDQGEALALSHRIAVMNQGRVEQLDAPETIYSF PRSRFVADFI GQCNL 248 

Qy 327 IDRRSKEREVATVEKAQS LAALFLEKVQGFDDF-LWKAEAKELNTSTHTVSL TL 379 

:| I I I I : I :| I I I I I : I 

Db 249 LD ATVEAVDGERVRIDLRGLGEVQALKSFDAQPGEACVLTLRPEKIRLAQSV 300 

Qy 380 TQDTDCGTAVELPGMIEQF STLIRRQISNDFRDLPTLL 417 

I I: I I : : II :: I I I III 

Db 301 TADSD EVHFRGRVAELLYLGDVTLYIVELENGER-LETLL 339 



RESULT 14 

US-09-328-352-6329 

Sequence 6329, Application US/09328352 
Patent No. 6562958 
GENERAL INFORMATION : 
APPLICANT: Gary L. Breton et al. 

TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
ACINETOBACTER 

TITLE OF INVENTION: BAUMANNII FOR DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: GTC99-03PA 

CURRENT APPLICATION NUMBER: US/09/328 , 352 
CURRENT FILING DATE: 1999-06-04 
NUMBER OF SEQ ID NOS : 8252 
SEQ ID NO 6329 
LENGTH: 359 
TYPE : PRT 

ORGANISM: Acinetobacter baumannii 
US-09-328-352-6329 

Query Match 6.4%; Score 223.5; DB 4; Length 359; 

Best Local Similarity 28.2%; Pred. No. 9.8e-15; 

Matches 67; Conservative 47; Mismatches 99; Inclels 25; Gaps 6; 

Qy 88 IRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVR-K 146 

::|:| I : : : I : : I I I II : : I I : I I ||: : |: :| II : 

Db 24 LKNISLDFPEGELVALLGPSGCGKTTLLRIIAGL ESADGGQVLLEGEDATNVHVRER 80 

Qy 147 CVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTF — SQAQRDKRVEDVIAELRLRQCANTR .204 

| Ml I : : | | : : | I : : I I I I : I : III : : : : I I 
Db 81 QVGFVFQHYALFRHMTVFDNI AFGLRVR- PRATRP S EAEI KKRVTRLLDLVQLGFLA 136 

Qy 205 VGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRL 264 

: I :|||:|:|::: | I : I : I I I I II: I I I : 



Db 137 --DRYPAQLSGGQRQRIALARALAVEPRVLLLDEPFGALDAKWKELRRWLRNLHDELHI 194 



Qy 265 VLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYV 322 

| : : : : | :::| I III I I I :| 

Db 195 TSIFVTHDQEEALEVADQIIVMNKGN VEQIGSPREVYEKPATPFV 239 



RESULT 15 

US-09-489-039A-8 815 

Sequence 8815, Application US/09489039A 
Patent No. 6610836 
GENERAL INFORMATION: 
APPLICANT: Gary Breton et . al 

TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
KLEBSIELLA 

TITLE OF INVENTION: PNEUMONIAE FOR DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: 2709.2004001 
CURRENT APPLICATION NUMBER: US/09/489, 039A 
CURRENT FILING DATE: 2 000-01-27 
PRIOR APPLICATION NUMBER: US 60/117,747 
PRIOR FILING DATE: 1999-01-29 
NUMBER OF SEQ ID NOS : 14342 
SEQ ID NO 8815 
LENGTH: 38 8 
TYPE: PRT 

ORGANISM: Klebsiella pneumoniae 
US-09-48 9-039A-8 815 

Query Match 6.4%; Score 223.5; DB 4; Length 388; 

Best Local Similarity 24.8%; Pred. No. l.le-14; 

Matches 61; Conservative 57; Mismatches 77; Indels 51; Gaps 8; 

LSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQ PSTPQLVRKC 147 

|| : |: : ::| lll|:::|| :: I : ||||:: : :||: : 

LSLDIHEGEFWLVGPSGCGKSTLLRLLAGL EPVSEGQIWLHNENITAATPR— ERN 106 

VAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRVGN 207 
I : I : | | : | : | | : : | : : I : : I I : I I : I : : 



: I I I : I : I I : : : : I I : : : I I I I I I : I I I - : 

-LSGGQRQRVAMARAIVRNPRLFLMDEPLSNLD ARLRSEVRDSIM 207 



Qy 


91 


Db 


52 


Qy 


148 


Db 


107 


Qy 


208 


Db 


164 


Qy 


268 


Db 


208 


Qy 


317 


Db 


255 



: : I I I : I 

-HVQQVGRPEYLYAN 254 



I I : : I 



Search completed: February 27, 2004, 07:20:14 
Job time : 17.2266 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search, using sw model 

Run on: February 27, 2004, 06:44:33 ; Search time 14.9728 Seconds 

(without alignments) 
4317.206 Million cell updates/sec 



Title: US-09-989-981A-4 
Perfect score: 3494 
Sequence : 

Scoring table: 
Searched: 



1 MAEKTKEETQLWNGTVLQDA FLFLYYLSLKLIKQKSIQDW 672 

BLOSUM62 

Gapop 10.0 , Gapext 0.5 
283366 seqs, 96191526 residues 



Total number of hits satisfying chosen parameters: 



283366 



Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



Database : PIRJ78:* 
1: pirl:* 
2: pir2:* 
3: pir3:* 
4: pir4:* 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



SUMMARIES 

% 

Result Query 

No. Score Match Length DB ID Description 



1 


111 


22. 


2 


646 


2 


C86441 


probable ABC trans 


2 


749.5 


21. 


5 


725 


2 


C84423 


probable ABC trans 


3 


670.5 


19. 


2 


656 


2 


JC7860 


brain multidrug re 


4 


666.5 


19. 


1 


609 


2 


E96742 


probable ABC trans 


5 


655 


18. 


7 


1294 


2 


S77690 


probable membrane 


6 


639 


18. 


3 


638 


2 


G02068 


white homolog - hu 


7 


634 


18. 


1 


737 


2 


T46101 


ABC transporter-li 


8 


630.5 


18. 


0 


687 


1 


FYFFW 


white protein - fr 


9 


629.5 


18. 


0 


635 


2 


T08934 


hypothetical prote 


10 


601.5 


17. 


2 


720 


2 


T47648 


ABC transporter-li 


11 


595.5 


17. 


0 


590 


2 


B96573 


protein F12M16.17 


12 


582.5 


16. 


7 


725 


2 


T47652 


ABC transporter-li 


13 


581.5 


16. 


6 


559 


2 


B88474 


protein C05D10.3 [ 



1 4 


580 5 

joy i j 


16 . 


6 


708 


2 


T47650 


ABC transporter-li 


1 s 

X -> 


579 . 5 


16 . 


6 


64 9 


2 


A8 4 5 0 9 


probable ABC trans 


16 
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16 . 


3 


687 


2 


D96553 


hypothetical prote 


17 


567 . 5 


16 . 


2 
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2 


T47649 


ABC transporter-li 


18 


566.5 


16 . 


2 


1049 


1 


S19421 


ATP -dependent perm 


19 


563 . 5 


16 . 


1 
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2 


JC7777 


ATP binding casset 


20 


563 


16 . 


1 


739 


2 


T45891 


ABC transporter-li 
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16 . 


1 
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2 
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hypothetical prote 


99 




X O • 


o 
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2 


G84791 


probable ABC trans 


9 ^ 
3 


S S9 S 


X • 


q 

o 


74 0 




T09 567 


probable ATP-bindi 


9 4 


S4 4 


1 S 
x «j . 
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2 


H96552 


hypothetical prote 


9 ^ 
z. 3 


J JU 


15 . 


3 
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2 


T34391 


hypothetical prote 


9 £ 


J O J • J 




Q 
3 


vj _7 -J 


2 
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hvDothetical nrote 


97 
z. / 


530 


15 . 


2 


577 


2 


T04229 


ABC-type transport 


9 ft 


^94 


1 s 

X 3 . 


n 


U X u 


2 


T19333 


hypothetical prote 


9 Q 


Jlo . J 


1 4 


7 




2 


T19189 


hypothetical prote 


3 u 


3 U / . 3 


1 4 




*J -3 


2 


F8631 3 
i_j \j \j _i_ >j 


hypothetical prote 
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1 4 




1443 


2 


T02491 


probable ABC trans 


^9 
3Z 


3 IJ V . 3 


1 4 


3 


o o _? 


9 


G88839 


orotein C10C6.5 Ti 


3 3 


4 ft 7 




Q 

z* 


1 4 SO 

X " *J w 


2 
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ABC transporter— li 


3 4 


A ft £ ^ 
fi 0 O . 3 




Q 

-7 
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9 
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nrobable ABC trans 


3 3 


4 ft £ 
*i O O 


X 3 . 


Q 


1 4 SI 


2 




F9L1.15 protein - 


3 D 


4 ft n 

40U 
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7 


54 7 


2 
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hypothetical prote 


3 1 


4 7 Q S 


1 "3 


7 


X *± O J 


2 


H96622 


probable ABC trans 


"3 ft 

3 o 


4 7ft R 

1/0.3 


X3 . 


7 


14 9 0 


2 


T02644 


ABC-type transport 


^ Q 

3 y 


ii 7 n 


X3 . 


3 


1 4 ^ S 

X T O J 


9 
z. 


D96693 


protein Putative A 


/i n 
*i u 


fl D D . 3 


1 ^ 
X 3 . 


1 


i 4 sn 

11JU 


0 

z. 


Aft47ft0 


probable ABC trans 


41 


464.5 


13. 


3 


1413 


2 


G84790 


probable ABC trans 


42 


451 


12. 


9 


675 


1 


FYFFB 


brown protein - fr 


43 


441.5 


12. 


6 


1564 


2 


S55517 


probable transport 


44 


426 


12. 


2 


1426 


2 


T30567 


ATP-binding casset 


45 


424.5 


12. 


1 


1490 


2 


T30550 


ABC transport prot 



ALIGNMENTS 



RESULT 1 
C86441 

probable ABC transporter [imported] - Arabidopsis thaliana 
C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 02-Mar-2001 #sequence___revision 02-Mar-2001 #text_change 31-Mar-2001 
C; Accession: C86441 

R;Theologis, A.; Ecker, J.R.; Palm, C.J.; Federspiel, N.A. ; Kaul, S.; White, O.; 
Alonso, J.; Altaf, H.; Araujo, R.;. Bowman, C.L.; Brooks, S.Y.; Buehler, E.; 
Chan, A.; Chao, Q. ; Chen, H. ; Cheuk, R.F.; Chin, C.W.; Chung, M.K.; Conn, L. ; 
Conway, A.B.; Conway, A.R.; Creasy, T.H.; Dewar, K. ; Dunn, P.; Etgu, P.; 
Feldblyum, T.V.; Feng, J. ; Fong, B.; Fujii, C.Y.; Gill, J.E.; Goldsmith, A.D.; 
Haas, B.; Hansen, N.F.; Hughes, B.; Huizar, L. 
Nature 408, 816-820, 2000 

A;Authors: Hunter, J.L.; Jenkins, J. ; Johnson-Hopson, C; Khan, S.; Khaykin, E. ; 
Kim, C.J.; Koo, H.L.; Kremenetskaia, I.; Kurtz, D.B.; Kwan, A.; Lam, B.; Langin- 
Hooper, S . ; Lee, A.; Lee, J.M. ; Lenz, C.A.; Li, J.H.; Li, Y. ; Lin, X.; Liu, 
S.X.; Liu, Z.A. ; Luros, J.S.; Maiti, R. ; Marziali, A.; Militscher, J. ; Miranda, 
M. ; Nguyen, M. ; Nierman, W.C.; Osborne, B.I.; Pai, G.; Peterson, J.; Pham, P.K.; 
Rizzo, M. ; Rooney, T.; Rowley, D. ; Sakano, H. 



A;Authors: Salzberg, S.L.; Schwartz, J.R.; Shinn, P.; Southwick, A.M.; Sun, H. ; 
Tallon, L.J.; Tambunga, G.; Toriumi, M.J.; Town, CD.; Utterback, T.; van Men, 
S.; Vaysberg, M. ; Vysotskaia, V.S.; Walker, M. ; Wu, D-; Yu, G. ; Fraser, CM,; 
Venter, J.C.; Davis, R.W. 

A; Title: Sequence and analysis of chromosome 1 of the plant Arabidopsis. 

A;Reference number: A86141; MUID : 210167 19 ; PMID : 11130712 

A;Accession: C86441 

A; Status: preliminary 

A; Molecule type: DNA 

A; Residues: 1-646 <STO> 

A; Cross-references: GB:AE005172; NID : glll36734 ; PIDN : AAG31315 . 1 ; GSPDB : GN00141 

C; Genetics : 

A; Map position: 1 

C;Superfamily: Arabidopsis thaliana probable ATP-binding cassette protein 
F12L6.1; ATP-binding cassette homology 

Query Match 22.2%; Score 777; DB 2; Length 646; 

Best Local Similarity 30.9%; Pred. No. 1.9e-52; 

Matches 214; Conservative 132; Mismatches 266; Indels 80; Gaps 22; 

Qy 1 MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYF-TYSGQSN TLEVRDLT 53 

: | : : I : I : : I I I I : : : I : I I I : I : : : : : 

Db 6 IAPRPEED GGVMVQ GLPD-MSDTQSKSVLAFPTITSQPGLQMSMYPITLKEW 57 

Qy 54 YQVDI ASQVPWFEQLAQ FKI PWRS H S SQD S CELGI RNL S FKVRS GQMLAI I GS S GCGRAS 113 

I : I I I I : I I : I : : : : I I : I I : : I I I I : : 

Db 58 YKVKI EQTSQCMGSWKSKE KTILNGITGMVCPGEFLAMLGPSGSGKTT 105 

Qy 114 LLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQM 173 

II :|| I II:: INI: : I : I I I I I : I I I I I I I I : 

Db 106 LLSALGGR — LSKTFSGKVMYNGQPFSGCIKRR-TGFVAQDDVLYPHLTVWETLFFTALL 162 

Qy 174 RL P RT F S QAQ RD KRVE DVI AEL RL RQ CANT RVGNT YVRGVS GGE RRRVS I GVQL LWN P GI 233 

| | | :::::: I : M | I I I : I I :: I I I : I I I I : : I I I I I : : I I I : 

Db 163 RLPSSLTRDEKAEHVDRVIAELGLNRCTNSMIGGPLFRGISGGEKKRVSIGQEMLINPSL 222 

Qy 234 LILDEPTSGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIY 293 

I : I I I I I M I I I III : I I I : III I I I : -III I I : * I I Ml- Mill 
Db 223 LLLDEPTSGLDSTTAHRIWTIKRLASGGRTVWTIHQPSSRIYHMFDKVVLLSEGSPIY 282 

Qy 294 LGAAQQMVQYFTSIGHPCPRYSNPADFYVDLTS 1 D RRS KEREVAT VEKAQ S LAAL 348 

M I I : I I : I : I I I II : I I : : : I : I I I : : : I : 

Db 283 YGAAS SAVEYFSSLGFSTSLTVNPADLLLDLANGIPPDTQKETSEQEQKTVK — ETLVSA 340 

Qy 34 9 FLEKVQGFDDFLWKAEAKELNTSTHTVSLT LTQDTDCGTAVELPGMI EQFSTLI 402 

: : : I : I : I : I : I I : I I I I : I : 

Db 341 YEKNIS TKLKAELCNAESHSYEYTKAAAKNLKSEQWCTT WWYQFTVLL 388 

Qy 403 RRQI-SNDFRDLPTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALIPFN 4 61 

: I : I II : : I I : I I I I I I I I 

Db 389 QRGVRERRFES FNKLRI F QVISVAFLGGLLWWHTPKS-HIQDRTALLFFFSVFWGFY 444 

Qy 462 VILDWSKCHSERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLR 521 

: : I I : I I I I : I I I I : : I : I I I : III: I : 

Db 445 PLYNAVFTFPQEKRMLIKERSSGMYRLSSYFMARNVGDLPLELALPTAFVFIIYWMGGLK 504 



522 PVPELFLLHFLLWLWFCCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDN 581 



I I I : I I : I I : : M I : I : : : I : I : : 

Db 505 PDPTTFILSLLVVLYSVLVAQGLGLAFGALIJytNIKQATTLASVTTLVFLIAGGYYVQQIP 564 

Qy 582 LWIVPAWISKLSFLRWCFSGLMQIQFNGHLY TTQIGNFTFS ILGDTMI SAMDL 634 

: I I I : I I : : I : I : II : I —hi I : I I 

Db 565 PFIV— WLKYLSYSYYCYKLLLGIQYTDDDYYECSKGVWCRVGDF PAIKSMGL 615 

Qy 635 NSHPLYAIYLIVIGIS-YGFLFLYYLSLKLIK 665 

I : I : I : I : I : : I : : I : I 

Db 616 NN LWI DVFVMGVMLVGYRLMAYMALHRVK 644 



RESULT 2 
C84423 

probable ABC transporter [imported] - Arabiciopsis thaliana 
C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 02-Feb-2001 #sequence_revision 02-Feb-2001 #text_change 02-Feb-2001 
C;Accession: C84423 

R;Lin, X.; Kaul, S.; Rounsley, S.D.; Shea, T.P.; Benito, M.I.; Town, CD.; 
Fujii, C.Y.; Mason, T.M.; Bowman, C.L.; Barnstead, M.E.; Feldblyum, T.V.; Buell, 
C.R.; Ketchum, K.A.; Lee, J. J.; Ronning, CM.; Koo, H.; Moffat, K.S.; Cronin, 
L.A.; Shen, M. ; VanAken, S.E.; Umayam, L.; Tallon, L.J.; Gill, J.E.; Adams, 
M.D.; Carrera, A. J.; Creasy, T.H.; Goodman, H.M. ; Somerville, C.R.; Copenhaver, 
G.P.; Preuss, D. ; Nierman, W.C.; White, 0.; Eisen, J. A. ; Salzberg, S.L.; Fraser, 
CM.; Venter, J.C 
Nature 402, 761-768, 1999 

A; Title: Sequence and analysis of chromosome 2 of the plant Arabidopsis 
thaliana. 

A; Reference number: A84420; MUID: 20083487 ; PMID : 10617197 
A; Accession: C84423 
A; Status: preliminary 
A; Molecule type: DNA 
A; Residues: 1-725 <STO> 

A; Cross-references: GB:AE002093; NID: g4262239; PIDN : AAD14532 . 1 ; GSPDB : GN00139 
C; Genetics : 
A;Gene: At2g01320 
A;Map position: 2 



Query Match 21.5%; Score 749.5; DB 2; 

Best Local Similarity 29.2%; Pred. No. 3.1e-50; 
Matches 180; Conservative 128; Mismatches 260; 



Length 725; 
Indels 49; 



Gaps 



Qy 


73 


Db 


70 


Qy 


121 


Db 


130 


Qy 


181 


Db 


184 


Qy 


241 


Db 


244 



I I I 



: : I : I 



I : : I I I : I II I : : I I : I : I 



RGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFS 
II M : : I I : I I : : : : I I I I I I I I I I I I : I I : : : I I I 

RLH LSGLLEVNGKPSSSKAYK — LAFVRQEDLFFSQLTVRETLSFAAELQLPEISS 

QAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPT 

: I I : I : : : : I I I I : : I I : I I I : I I I I : : I : I : : I : : I : : I I I I 

AEERDEYVNNLLLKLGLVSCADSCVGDAKVRGISGGEKKRLSLACELIASPSVIFADEPT 



10; 
120 
129 
180 
183 
240 
243 



I I I : I I 



I I 



I : I : I I 



I : I II : I 



Qy 300 MVQYFTSIGHPCPRYSNPADFWDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDF 359 

: I I : I I I : I I I : I I I I : I II : : :: : I I 

Db 304 PLTYFGNFGFLCPEHVNPAEFLADLISVDYSSSETVYSSQKRVHALVDAF 353 

Qy 360 LWKAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIE QFSTLIRRQISNDFRD 412 

: : : : : I : : : : I I : : I I I I : : I II 

Db 354 SQRSSSVLYATPLSMKEETKNGMRPRRKAIVERTDGWWRQFFLLLKRAWMQASRD 408 

Qy 413 LPTLLIHGSEACLMSLI I GFLYYGHGAKQLS FMDTAALLFMI GALI PFNVI LDWSKCHS 472 

II: : :: | I ::: I I I I I I : : I 

Db 409 GPTNKVRARMSVASAVT FGSVFWRMGKSQTSIQDRMGLLQVAAINTAMAALTKTVGVFPK 468 

Qy 473 ERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFL 532 

||::: I I I : I I I : I : I : I I : : : : : I : II I 
Db 469 ERAIVDRERSKGSYSLGPYLLSKTIAEIPIGAAFPLMFGAVLYPMARLNPTLSRFGKFCG 528 

Qy 533 LWLWFCCRTMAIAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKL 592 

: | : | || I I : I : : ^ : I I : I : : I I I I : I I : 

Db 529 I VT VE S FAAS AMG LT VGAMVP S T EAAMAVG P S LMT VF I VFGG Y YVN ADNT P 1 1 F RW I P RA 588 

Qy 593 SFLRWCFSGLMQIQFNGHLYTTQIGNFTFSI-LGDTMISAMDLNSHPLYAIYLIVIGISY 651 

I : I I I I I : I : I : I I I : I : : : : I 
Db 589 SLIRWAFQGLCINEFSGLKFDHQ NTFDVQTGEQALERLSFGGRRIRE TIAAQS 641 

Qy 652 GFLFLYYLSLKLIKQKS 668 

I : I : I : : I : 
Db 642 RI LMFWYSAT YLLLEKN 658 



RESULT 3 
JC7860 

brain multidrug resistance protein, BMDP - pig 
C; Species: Sus scrofa domestica (domestic pig) 

C;Date: 18-Nov-2002 #sequence__revision 18-Nov-2002 #text_change 31-Mar-2003 
C; Accession: JC7 8 60 
R;Eisenblaetter, T.; Galla, H.J. 

Biochem. Biophys. Res. Commun. 293, 1273-1278, 2002 

A; Title: A new multidrug resistance protein at the blood-brain barrier. 

A;Reference number: JC7860; MUID: 22050127 ; PMID: 12054514 

A;Accession: JC7860 

A; Molecule type: mRNA 

A; Residues: 1-656 <EIS> 

A; Cross-references : GB : AJ420927 

A; Experimental source: brain 

C; Comment: This protein, a new transport protein of the ATP-binding cassette 
(ABC) superfamily of transporters, expressed in porcine brain capillary 
endothelial cells, plays an importnat role in the exclusion of xenobiotics from 
the brain and participates in drug transport across the blood-brain barrier and 
therefore is considered as a efflux pump at the cerebral endothelium. 
C; Genetics : 
A; Gene: bmdp 



Query Match 19.2%; Score 670.5; DB 2; 

Best Local Similarity 28.3%; Pred. No. 3.9e-44; 
Matches 201; Conservative 125; Mismatches 258; 



Length 656; 

Indels 125; Gaps 24; 



Qy 18 QDASGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIASQVPWFEQLAQFKIPWRS 77 

:: :|| I I I I : I I I: hi : I : : : :| 
Db 15 RNTNGLPGS S SNELKTSAGGA — VXS FHDI CYRVKVKSGFLFCRKTVEKEI 63 

Qy 78 HSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR — GHGGKMKSGQIWIN 135 

: |:: : : I : I I : I : I I : : I I I I I : I II II : II 

Db 64 LTNINGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPHG LSGDVLIN 109 

Qy 136 GQPSTPQLVRKC-VAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVI7VE 194 

M II : I I I : : I I I I I I I I : I I I I : : : : : I : III 

Db 110 GAPRPANF — KCNSGYWQDDVVMGTLTVRENLQFSAALRLPTTMTNHEKNERINMVIQE 167 

Qy 195 LRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTT 254 

| | : |:::|| ::|||MIII:| II = M = M II I I I 1 I * t I I I II: :: 
Db 168 LGLDKVADSKVGTQFIRGVSGGERKRTSIAMELITDPSILFLDEPTTGLDSSTANAVLLL 227 

Qy 255 LS RLAKGNRLVL I S LHQ P RS D I FRLFDLVLLMT S GT P I YLGAAQQMVQ YFT S I GH P C P RY 314 

I I :: I | : : | : I I I I I I : I I I : I : I I : : I I : : : I I I I I : I I 
Db 228 LKRMSKQGRTIIFSIHQPRYSIFKLFDSLTLLASGRLMFHGPAREALGYFASIGYNCEPY 287 

Qy 315 SNPADFYVD LTSIDR RS KE REVAT VEKAQ S LAAL FL E 351 

:|||||::| h M l=: : M Ml = 
Db 288 NNPADFFLDVINGDSSAWLSRADRDEGAQEPEEPPEKDTPLIDK LAAFYTNSSFFK 344 

Qy 352 — KVQGFDDFLW KAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQI 4 06 

||: I I I :: Ml I I : II 

Db 345 DTKVE-LDQFSGGRKKKKSSVYKEVTYTTS FCHQLRWISRRSF 386 

Qy 407 SNDFRDLPTLLIHGSEACLMSLI I GFLYYGHGAKQLS FMDTAALLFMI GALI PFNVILDV 466 

| : : : : | : | | : : | : I : I I : 
Db 387 KNLLGNPQASVAQIIVTIILGLVIGAI FYDLKNDPSGIQNRAGVLFFL T 435 

Qy 467 VSKCHS ERSMLYYELEDGLYTAGP YFFAKI LGE- LPEHCAYVI I YAMPI Y 515 

:: | | I : : : I II I I I I : I : I I I M I 

Db 436 TNQCFSSVSAVELLWEKKLFIHEYISGYYRVSSYFFGKLLSDLLPMRMLPSIIFTCITY 495 

Qy 516 WLTNLRPVPELFLLHFLLWLWFCCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGF 575 

: I I : I I : : : I : =1111 * I ::: : : M 

Db 496 FLLGLKPAVGSFFIMMFTLMMVAYSASSMALAIAAGQSVVSVA 555 

Qy 576 MINLDNLWI VP — AW I SKLS FLRWCFSGLMQIQFNGH LYTTQI GNFTFS I LGDT- 627 

::|| :|| :|: I hill Ml II MM 

Db 556 LVNLKT— WPWLSWLQYFSIPRYGFSALQYNEFLGQNFCPGLNVTTNNTCSFAICTGAE 613 

Qy 628 MI SAMDL-NSHPLYAI YLIVIGI S YGFLFLYYLSLKLIKQKS 668 

: | | I : I I : : : II = II 1 1=1= I 

Db 614 YLENQGI SLSAWGLWQNHVALACMMVI FLT I AYLKLLLLKKYS 656 



RESULT 4 
E96742 

probable ABC transporter F17M19.11 [imported] - Arabidopsis thaliana 
C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 02-Mar-2001 #sequence_revision 02-Mar-2001 #text_change 23-Mar-2001 
C; Accession: E96742 

R;Theologis, A.; Ecker, J.R.; Palm, C.J.; Federspiel, N.A. ; Kaul, S-; White, O.; 
Alonso, J.; Altaf, H.; Araujo, R. ; Bowman, C.L.; Brooks, S.Y.; Buehler, E. ; 



Chan, A.; Chao, Q.; Chen, H.; Cheuk, R.F.; Chin, C.W.; Chung, M.K.; Conn, L. ; 
Conway, A.B.; Conway, A.R.; Creasy, T.H.; Dewar, K. ; Dunn, P.; Etgu, P.; 
Feldblyum, T.V.; Feng, J.; Fong, B . ; Fujii, C.Y.; Gill, J.E.; Goldsmith, A.D.; 
Haas, B.; Hansen, N.F.; Hughes, B.; Huizar, L. 
Nature 408, 816-820, 2000 

A;Authors: Hunter, J.L.; Jenkins, J.; Johnson-Hopson, C; Khan, S . ; Khaykin, E . ; 
Kim, C.J.; Koo, H.L.; Kremenetskaia, I.; Kurtz, D.B.; Kwan, A.; Lam, B.; Langin- 
Hooper, S.; Lee, A.; Lee, J.M. ; Lenz, C.A. ; Li, J.H.; Li, Y. ; Lin, X.; Liu, 
S.X.; Liu, Z.A. ; Luros, J.S.; Maiti, R. ; Marziali, A.; Militscher, J.; Miranda, 
M.; Nguyen, M. ; Nierman, W.C.; Osborne, B.I.; Pai, G. ; Peterson, J.; Pham, P.K.; 
Rizzo, M. ; Rooney, T.; Rowley, D.; Sakano, H. 

A;Authors: Salzberg, S.L.; Schwartz, J.R.; Shinn, P.; Southwick, A.M.; Sun, H.; 
Tallon, L.J.; Tambunga, G. ; Toriumi, M.J.; Town, CD.; Utterback, T . ; van Aken, 
S.; Vaysberg, M. ; Vysotskaia, V.S.; Walker, M. ; Wu, D.; Yu, G. ; Fraser, CM. ; 
Venter, J.C; Davis, R.W. 

A; Title: Sequence and analysis of chromosome 1 of the plant Arabidopsis. 

A; Reference number: A86141; MUID: 21016719; PMID : 11130712 

A; Accession: E96742 

A; Status : preliminary 

A;Molecule type: DNA 

A; Residues: 1-609 <STO> 

A; Cross-references: GB:AE005173; NID: g6978921; PIDN: AAF34313 . 1 ; GSPDB: GN00141 

C; Genetics : 

A; Gene: F17M19.11 

A;Map position: 1 

C; Super family: fruit fly white protein; ATP-binding cassette homology 

Query Match 19.1%; Score 666.5; DB 2; Length 609; 

Best Local Similarity 30.2%; Pred. No. 7.1e-44; 



Matches 


195; Conservative 107; Mismatches 246; Indels 97; Gaps 


20; 


Qy 


80 


SQDSCE LGIR N L S FKVRS GQMLAI I G S S GC GRAS L L DVI 

| Ml ||:: :: : |: :|::| II |:::||: : 
SNDSCNIKKLLGLKQKPSDETRSTEERTILSGVTGMISPGEFMAVLGPSGSGKSTLLNAV 


118 


Db 


2 


61 


Qy 


119 


TGRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRT 

MM : II 1 1 ::: 1 1 1 1 1 : 1 1 1 1 1 1 1 1 : 1 : 1 1 1 1 : 
AGRLHGSNL-TGKILINDGKITKQTLKR-TGFVAQDDLLYPHLTVRETLVFVALLRLPRS 


178 


Db 


62 


119 


Qy 


179 


FSQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDE 

: : : : | | | : | | | : | I I 1 1 1 1 :: M : 1 1 1 1 1 : 1 1 1 1 : 1 1 1 1 : 1 : II 1 
LTRDVKLRAAESVISELGLTKCENTWGNTFIRGISGGERKRVSIAHELLINPSLLVLDE 


238 


Db 


120 


179 


Qy 


239 


PTSGLDSFTAHNLWTLSRLAKG-NRLVXISLHQPRSDIFRLFDLVLLMTSGTPIYLGAA 

MIMI: | MIMIII : 1 : 1 : 1 1 1 1 : 1 : : 1 1 1 1 1 : : 1 

PT S GL DAT AAL RLVQT LAGLAHGKGKTWT SIHQPSS RVFQMFDTVLLL S EGKCLFVGKG 


297 


Db 


180 


239 


Qy 


298 


QQMVQYFTSIGHPCPRYSNPADFYVDLTSIDRRS KEREVATVEKAQSLAALFLEKVQ 

: : II 1:1 Mill Ml : :: III 1 hi : 
RDAMAYFESVGFSPAFPMNPADFLLDLANGVCQTDGVTEREKPNVR — QTLVTAY 


354 


Db 


240 


292 


Qy 


355 


GFDDFLWKAEAKELNTSTH TVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDF 


410 


Db 


293 


| | : I M Ml: | |: I : : 
DTLLAPQVKTCIEVSHFPQDNARFVKTRVNGGGITTCIATWFSQLCILLHR-LLKER 


348 


Qy 


411 


RDLPTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMI GAL I P FN VI LDV 

i i . * I • - I • • • • • 1 1 1 1 1 i 1 M : 


466 



Db 



349 RHESFDLLRIFQWAASILCGLMWWHSDYRDVH — DRLGLLFFISIFWGVLPSFNAVFTF 406 



Qy 467 VSKCHSERSMLYYELEDGLYTAGPYFFAKILGELPEHCAWIIYAMPIYWLTNLRPVPEL 526 

II:: I I : I I I I I : I I I ' I I : I I I 

Db 4 07 PQERAI FTRERASGMYTLSSYFMAHVLGSLSMELVLPASFLTFTYWMVYLRPGIVP 4 62 

Qy 527 FLLHFLLVWLWFCCRTMALAASAMLPTFHMS S FFCNALYNS FYLTAGFMINLDNLWI VP 586 

Ml :: I I : : II I : :| :| || |: :| II 
Db 4 63 FLLTLSVLLLYVLASQGLGI^GAAIMDAKKASTIVTVTMLAFVLTGGYYVNK VP 517 



QY 



587 A WISKLSFLRWCFSGLMQIQF NGHLYTTQI G-NFT- FSI L 624 

: I : :| : I: I : I I: I III - 

Db 518 SGMVWMKYVSTTFYCYRLLVAIQYGSGEEILRMLGCDSKGKQGASAATSAGCRFVEEEVI 577 



Qy 625 GD TMISAMDLNSHPLYAIYLIVIGISYGFLFLYYLSLKLIK 665 

|| I : : I : : | : | M : | : | | 

Db 578 GDVGMWT S VGVLFL MFFGYRVLAYLALRRIK 608 



RESULT 5 
S77690 

probable membrane protein YOL075c - yeast (Saccharomyces cerevisiae) 
N;Alternate names: hypothetical protein 01125; hypothetical protein O1130; 
hypothetical protein YOL074c 
C; Species: Saccharomyces cerevisiae 

C;Date: 21-Apr-1997 #sequence_revision 09-May-1997 #text_change 19-Apr-2002 

C;Accession: S77690; S66767; S66768 

R;Alexandraki, D.; Katsoulou, C; Tzermia, M. 

submitted to the Protein Sequence Database, July 1996 

A; Reference number: S 667 5 6 

A; Accession: S77690 

A;Molecule type: DNA 

A; Residues: 1-1294 <ALE> 

A;Cross-references: EMBL : Z74816 ; MIPS:YOL075c 

A;Note: this is a revision to the sequence from reference S66756 
A;Accession: S66767 
A;Molecule type: DNA 

A;Residues: 1-179, ' TTRTGVFLWKRED 1 <ALW> 

A; Cross-references: EMBL:Z74816 

A; Experimental source: strain S288C 

A; Note: this sequence has been revised in reference S77690 

A;Note: this was assumed to be protein YOL074c 

A;Accession: S66768 

A; Molecule type: DNA 

A; Residues: 200-1294 <ALF> 

A/Cross-references : EMBL: Z74817 

A; Experimental source: strain S288C 

A;Note: this sequence has been revised in reference S77690 

A;Note: this was assumed to be the complete sequence of protein YOL075c 

C; Genetics : 

A;Cross-references: SGD: S0005435 
A;Map position: 15L 
A;Note: YOL075c 

C;Superfamily: unassigned ATP-binding cassette proteins; ATP-binding cassette 
homology 

C; Keywords: ATP; nucleotide binding; P-loop; transmembrane protein 
F;45-263/Domain: ATP-binding cassette homology <ABC1> 



F; 62-69/Region: nucleotide-binding motif A (P-loop) 
F;376-392/Domain: transmembrane #status predicted <TM1> 
F;469-485/Domain: transmembrane #status predicted <TM2> 
F;496-512/Domain: transmembrane #status predicted <TM3> 
F; 606-622/Domain: transmembrane #status predicted <TM4> 
F;710-916/Domain: ATP-binding cassette homology <ABC2> 
F;727-734/Region: nucleotide-binding motif A (P-loop) 
F; 1042-1 058 /Domain : transmembrane #status predicted <TM5> 
F; 1125-1141/Domain: transmembrane #status predicted <TM6> 
F; 1177-1193/Domain: transmembrane #status predicted <TM7> 
F;1269-1285/Domain: transmembrane #status predicted <TM8> 

Query Match 18.7%; Score 655; DB 2; Length 1294; 

Best Local Similarity 28.1%; Pred. No. 1.6e-42; 

Matches 173; Conservative 115; Mismatches 272; Indels 56; Gaps 13; 

Qy 88 I RNLS FKVRS GQMLAI I GS S GCGRAS LLDVI TGRGHGGKMKS GQI 132 

: | : | | :: | :: | I I I : : I I : I : : II : I I 
Db 45 WTFSMDLPSGSVMAVMGGSGSGKTTLLNVLASKISGGLTHNGSIRYVLEDTGSEPNETE 104 

Qy 133 WINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKR- 187 

::|| | :|:: I I I I II I I I I I I I : ::| I: 

Db 105 PKRAHLDGQ-DHPIQKHVIMAYLPQQDVLSPRLTCRETLKFAADLKL NSSERTKKL 159 

Qy 188 -VEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSF 246 

|| : | | | I : M : I I I : I I : I I I I : I I : I I I I : : II I : I I I I I : I I I : : 
Db 160 MVEQLIEELGLKDCADTLVGDNSHRGLSGGEKRRLSIGTQMISNPSIMFLDEPTTGLDAY 219 

Qy 247 TAHNLVT T L S RLAK- GN RLVL I S LHQ P RS D I FRL FD LVL LMT S GT P I YLGAAQQMVQ Y FT 305 

: I : : I I : I I I I : : I : I I I I I I I I I I : : : I : I : II 

Db 220 SAFLVIKTLKKLAKEDGRTFIMSIHQPRSDILFLLDQVCILSKGNWYCDKMDNTIPYFE 279 

Qy 306 SIGHPCPRYSNPADFWDLTSIDRRSKEREVATV^KAQSIAALFLEKVQGFDDFLWKAEA 365 

Ml: I : I I I I : : : I I : I : I I I : I I I : M : : I : I 
Db 28 0 SIGYHVPQLVNPADYFIDLSSVDSRSDKEEAATQSRLNSL IDHWHDY ER 328 

Qy 366 KELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACL 425 

| : |: II :| : I II :| I II: :| : 

Db 329 THLQLQAESYISNATEIQIQNMTTRLP-FWKQVTVLTRRNFKLNFSDYVTLISTFAEPLI 387 

Qy 42 6 MSLI I GFLYYGHGAKQLS FMDTAALLFMI GALI P — FNVI LDWSKCHSERSMLYYELED 483 

: : | : : | I : : I : : : : I I : : : I : 

Db 388 IGTVCGWIYYKPDKSSIGGLRTTTACLYASTILQCYLYLLFDTYRLCEQDIALYDRERAE 447 

Qy 484 GLYTAGPYFFA-KILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLWFCCR 542 

II till I : I : I : I :: I I I : I : I I 

Db 44 8 GSVTPLAFIVARKISLFLSDDFAMTMIFVSITYFMFGLEADARKFFYQFAWFLCQLSCS 507 

Qy 543 TMALAASAMLPTFHMS S FFCNALYNS FYLTAGFMINLDNLWIVPAWI SKLS FLRWCFSGL 602 

::: : |: I :| I : : II :l : : II "I : I I 

Db 508 GLSMLSVAVSRDFSKASLVGNMTFTVXSMGCGFFWAKVMPVYVRW 567 

Qy 603 MQIQFNGHLYTTQ 1 GNFTFSI LGDTMI SAMDLNSHPLYAI YLIVI GI S Y GFL 654 

| | II : I I : I I : I : I = : I I : 

Db 568 MSSTFTNSYCTTDNLDECLGNQILEVYG FPRNWITVPAWLLCWSVGYFWGAI 621 



Qy 655 FLYYLSLKLIKQKSIQ 67 0 



Db ■ 622 ILYLHKIDITLQNEVK 637 



RESULT 6 
G02068 

white homolog - human 

C; Species: Homo sapiens (man) 

C;Date: 21-Dec-1996 #sequence__revision 06-Jun-1997 #text_change 02-Feb-2001 
C; Accession: G02068 

R;Croop, J.M.; Tiller, G. ; Fletcher, J. A.; Lux, M. ; Raab, E.; Goldenson, D. ; 
Arciniegas, S.; Son, D . ; Wu, R. 

submitted to the EMBL Data Library, August 1995 
A; Reference number: H007 69 
A; Accession: G02068 

A; Status: preliminary; translated from GB/EMBL/DDBJ 
A; Molecule type: mRNA 
A; Residues: 1-638 <CRO> 

A; Cross-references: EMBL:U34919; NID: gl314276; PIDN : AAC51098 . 1 ; PID:gl314277 
C; Genetics : 
A; Gene: white 

C;Superfamily: fruit fly white protein; ATP-binding cassette homology 
C; Keywords: ATP; nucleotide binding; P-loop 
F; 61-253/Domain: ATP-binding cassette homology <ABC> 
F;78-85/Region: nucleotide-binding motif A (P-loop) 

Query Match 18.3%; Score 639; DB 2; Length 638; 

Best Local Similarity 26.5%; Pred. No. l.le-41; 

Matches 184; Conservative 138; Mismatches 262; Indels 110; Gaps 20; 

Qy 8 ETQLWNGTVLQDASGLQDS-LFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIASQVPWFE 66 

I I I I I : : : I : : I I I : : : I I I I : I I : I I 

Db 5 ETDLLNGHLKKVDNNLTEAQRFSSLPRRA AVNI EFRDLS YSV PEGPW— 51 

Qy 67 QLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGK 126 

|| : : : : I I | I : : : I I : I II I : : : I : : : : I I 

Db 52 WRKKGYKTL LKGISGKFNSGELVAIMGPSGAGKSTLMNILAGYRETG- 98 

Qy 127 MKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDK 186 

I I I : I M I : II : : I I I II : I I I : I : I I I : 

Db 99 MK-GAVLINGLPRDLRCFRKVSCYIMQDDMLLPHLTVQEAMMVSAHLKLQE— KDEGRRE 155 

Qy 187 RVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSF 246 

| : : : : | | I I I I I I : : I I I : I : I : : I : : I : I I : : I I I I I I I I I 

Db 156 MVKEILTALGLLSCANTRTGS LSGGQRKRLAIALELVNNPPVMFFDEPTSGLDSA 210 

Qy 247 TAHNLWTLSRLAKGNRLVXISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTS 306 

: : | : : I I : I I : : : : I I I : : I I I I : : : : I : I I : I I 
Db 211 SCFQWSLMKGLAQGGRSIICTIHQPSAKLFELFDQLYVLSQGQCVYRGKVCNLVPYLRD 270 

Qy 307 IGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDD 358 

:| II I Mill : IN: I : I : I I 

Db 271 LGLNCPTYHNPADFVM EVASGEYGDQNSRLVRAVREGMCDSDHKRDLG 318 

Qy 359 FLWKAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFR 411 

Ml ::|: : I : I : II I :| : I 

Db 319 GDAEVNPFLWHRPSEEVKQTKRLKGLRKDSSSMEGCHSFSASCLTQFCILFKRTFLSIMR 378 



Qy 412 DLPTLLIHGSEACLMSLIIGFLYYGHG — AKQL S FMDTAALLFMI GALI P FNVI LD 465 

I : : : I : I I I I I I I I : : I : : I I I I : I 
D b 379 DSVLTHLRITSHIGIGLLIGLLYLGIGNEAKKVLSNSGFLFFSMLFLMFAALMP 432 

Qy 4 66 WSKCHSERSMLYYELEDGL YT AG P Y FFAK I LGEL P EH CAYVI I YAMP I YW 516 

:: I : I I I : I : | : | | : :: I : : I : I I 

Db 433 TVLTFPLEMGVFLREHLNYWYSLKAYYLAKTMADVPFQIMFPVAYCSIVYW 483 

Qy 517 LTNLRPVPELFLLHFLLWLWFCCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFM 576 

:|: |:| I : ::: I I : :::| I :| I 

Db 484 MT S Q P S D AVAFVL FAAL GTMT S L VAQ S L GL L I GAAS T S LQ VAT FVG P VT AI P VLL F S G F F 543 

Qy 577 INLDNLWIVPAWISKLSFLRWCFSGLMQIQFNG HLYTTQIGNFTFSILGDTMIS 630 

: : I : | : | : | : : | : I I : : : I I : : I I : : : 
Db 544 VS FDTI PTYLQWMS YI S YVRYGFEGVI -LS I YGLDREDLHCDIDETCHFQKS EAILR 599 

Qy 631 AMDLNSHPLYAIYLIVIGISYGFLFLYYLSLKLI 664 

: I : : I I : I I : I I : : : I I : I I 

Db 600 ELDVENAKL Y- LDFI VLGI FFISLRLI 625 



RESULT 7 
T46101 

ABC transporter-like protein - Arabidopsis thaliana 

N;Alternate names: protein T25B15.80 

C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 04-Feb-2000 #sequence_revision 04-Feb-2000 #text_change 04-Feb-2000 
C;Accession: T46101 

R;Alcaraz, J. P.; Clabault, G.; Cottet, A.; Mache, R. ; Mewes, H.W. ; Lemcke, K. ; 

Mayer, K.F.X.; Quetier, F. ; Salanoubat, M. 

submitted to the Protein Sequence Database, January 2000 

A; Reference number: Z23021 

A; Accession: T46101 

A; Status : preliminary 

A;Molecule type: DNA 

A; Residues: 1-737 <ALC> 

A; Cross-references: EMBL: AL132972 

A; Experimental source: cultivar Columbia; BAC clone T25B15 

C; Genetics : 

A; Map position: 3 

A;Introns: 122/1; 146/3; 225/2; 277/2; 338/3; 422/2; 535/1; 628/3; 664/3 
A;Note: T25B15.80 

Query Match 18.1%; Score 634; DB 2; Length 737; 

Best Local Similarity 26.6%; Pred. No. 3.2e-41; 

Matches 184; Conservative 132; Mismatches 265; Indels 112; Gaps 16; 

Qy 13 NGTVLQDASGLQDSL — FSSESDNSLYFTYSGQSNTLEVRDLTYQVDIASQVPWFEQLAQ 70 

I : I : I I : I : I : I I : I : I I : I 
Db 118 NDDILEDIEAATSSWKFQAEPTFPIY LKFIDITYKVTTKGM 159 

Qy 71 FKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSG 130 

: | I : I II : : I : : I I : : I I I I : : I I : : I I : : I 

Db 160 TSSSEKSILNGISGSAY— PGELLALMGPSGSGKTTLLNALGGRFNQQNI-GG 209 



QY 



131 QIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVED 190 



: I : I : I : : I I I I I : I I I : I I I : I : I I I : I :::::: I 
Db 210 SVSYNDKPYSKHLKTR-IGFVTQDDVLFPHLTVKETLTYTALLRLPKTLTEQEKEQRAAS 268 

Qy 191 VIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHN 250 

II II I : I : I : I : : I I I I M I II : I I I I : : : II : I : I I I I I I III II 
Db 269 VIQELGLERCQDTMIGGSFVRGVSGGERKRVCIGNEIMTNPSLLLLDEPTSSLDSTTALK 328 

Qy 251 LVTTL S RLAKGNRLVLI S LHQ P RS D I FRLFDLVLLMT S GT P I YLGAAQQMVQ YFT S I GH P 310 

: I I : I I : : : : : | | I I : I | | : : : : : I : : I I I : : I I : I I I 
Db 329 IVQMLHCIAKAGKTIVTTIHQPSSRLFHRFDKLWLSRGSLLYFGKASEAMSYFSSIGCS 388 

Qy 311 CPRYSNPADFYVDLT SIDRRSKER E 335 

I I I : I : I I I : I I : : 

Db 389 PLLAMNPAEFLLDLWGNMNDISVPSALKEKMKIIRLELYVRNVKCDVETQYLEEAYKTQ 448 

Qy 336 VATVEKAQSLAALFLEKVQGFDDFLWK7VEAKELNTSTHTVSLTLTQDTDCGTAVELPGMI 395 

:| :| I : :| : I:: I I :| I 
Db 449 IAVMEKMKLMAPVPLDE EVKLMIT CPKREWGLSWW 4 83 

Qy 396 EQFSTLI RRQI SNDFRDLPTLLIHGSEACLMSLI IGFLYYGHGAKQLS FMDTAALLFMI G 455 

||:||| I : I :: ::|:| |:: : III I 

Db 4 84 EQYCLLSLRGIKERRHDYFSWL-RVTQVLSTAIILGLLWWQSDITS-QRPTRSGLLFFIA 541 

Qy 456 ALI P FN VI LDWS KCHS ERSMLYYELEDGL YTAGP YFFAKI LGELPEHCAYVI I YAMPI Y 515 

I : : I |:| I I I :| III: Ml ::: : :| 

Db 542 VFWGFFPVFTAI FTFPQERAMLSKERESNMYRLSAYFVARTTSDLPLDLILPVLFLVVVY 601 

Qy 516 WLTNLRPVPELFLLHFLLWLWFCCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGF 575 

:: I I I I I I 1:1 : : : I I I I :: : =11 I = 

Db 602 F^GLRLRAESFFLSVLTVFLCIVAAQGLGLAIGASmDLKKATTLASWVMTmLAGGY 661 

Qy 576 MINLDNLWIVPAWISKLSFLRWCFSGLMQIQFNGHLYTTQIGNFTFSILGDTMISAMDLN 635 

: : I I I I : I I : I : : : I : I : I : : I : 
Db 662 FVKKVP FFI — AWI RFMS FN YHT YKLLVKVQYE EIMESVNGEEIESGL 707 

Qy 636 SHPLYAIYLIVIGISYGFLFLYYLSLKLIKQKS 668 

: I: ::| |: : I I I : : I I 

Db 708 - KEVS ALVAMI I GYRLVAYFSLRRMKLHS 735 



RESULT 8 
FYFFW 

white protein - fruit fly (Drosophila melanogaster ) 
C; Species: Drosophila melanogaster 

C;Date: 31-Dec-1990 #sequence_revision 17-Feb-1995 #text_change 19-Jan-2001 
C;Accession: S08635; S07263; S10240 
R;Pepling, M. ; Mount, S.M. 
Nucleic Acids Res. 18, 1633, 1990 

A; Title: Sequence of a cDNA from the Drosophila melanogaster white gene. 
A; Reference number: S08635; MUID : 90221897 ; PMID:2109311 
A;Accession: S08635 
A;Molecule type: mRNA 
A; Residues: 1-687 <PEP> 

A; Cross-references: EMBL:X51749; NID:g8825; PIDN : CAA36038 . 1 ; PID:g8826 
R;0'Hare, K. ; Murphy, C; Levis, R. ; Rubin, G.M. 
J. Mol. Biol. 180, 437-455, 1984 

A; Title: DNA sequence of the white locus of Drosophila melanogaster. 



A;Reference number: S07263; MUID: 85134865; PMID:6084717 
A;Accession: S07263 
A;Molecule type: DNA 

A; Residues : . 1-24, ' LIFEIPYHCRVTAD ' , 30- 

334 , 1 ITLHLNSYPAWVPSVLPTTIRRTFTYRCWPLCPDGRSSPVIGSPRYG 1 ,372-687 <OHAl> 

A; Cross-references : EMBL:X02974 

A; Experimental source: strain Canton S 

R;0'Hare, K. 

submitted to the EMBL Data Library, June 1985 
A;Reference number: S10240 
A;Accession: SI 0240 
A; Molecule type: DNA 

A; Residues: 1-24 , ' LIFEIPYHCRVTAD 1 , 30-687 <OHA2> 

A; Cross-references: EMBL:X02974; NID:gl0873; PIDN: CAA26716 . 1; PID:gl0874 

A; Experimental source: strain Canton S 

C; Genetics : 

A; Gene: white; w 

A; Cross-references : FlyBase : FBgn0003996 
A;Introns: 24/3; 116/1; 334/2; 439/3; 483/3 

C;Superfamily: fruit fly white protein; ATP-binding cassette homology 

C; Keywords: ATP; glycoprotein; nucleotide binding; P-loop; transmembrane protein 

F; 113-317/Domain: ATP-binding cassette homology <ABC> 

F; 130- 137 /Region : nucleotide-binding motif A (P-loop) 

F;261-265/Region: nucleotide-binding motif B 

F; 67, 93, 472, 554, 651/Binding site: carbohydrate (Asn) (covalent) #status 
predicted 

Query Match 18.0%; Score 630.5; DB 1; Length 687; 

Best Local Similarity 29.4%; Pred. No. 5.4e-41; 



Matches 


179; Conservative 112; Mismatches 246; Indels 71; Gaps 


13; 


Qy 


88 


IRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGG— KMKSGQIWINGQPSTPQLVR 

: : 1 : I : : II : : 1 M 1 1 : : 1 1 : : 1 1 II : 1 1 1 1 : : : 
L KNVC GVAY P GEL LAVMG S S GAGKTT L LNALAFRS PQGI QVS P S GMRL LN GQP VDAKEMQ 


145 


Db 


113 


172 


Qy 


146 


KCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRV 
|:|:| 1 : :|l II 1 1 1 :|:M : II 1 1 : 1 1 1 1 1 : 1 : 1 : 
ARCAYVQQDDLFI GS LTAREHLI FQAMVRMPRHLT YRQRVARVDQVI QELS LS KCQHT 1 1 


205 


Db 


173 


232 


Qy 


206 


G-NTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRL 

I 1 : 1 : 1 1 1 1 1 : 1 :: : 1 : 1 : 1 1 1 1 1 1 1 1 II 1 1 1 1 1 :: 1 1 : 1 :: : 
GVPGRVKGLSGGERKRLAFASEALTDPPLLICDEPTSGLDSFTAHSWQVLKKLSQKGKT 


264 


Db 


233 


292 


Qy 


265 


VLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYV — 

|::::||| |::| III :|M 1 :|l : 1 :|: :| II MINI 
VILTIHQPSSELFELFDKILLMAEGRVAFLGTPSEAVDFFSYVGAQCPTNYNPADFYVQV 


322 


Db 


293 


352 


Qy 


323 


DLTSIDRRSKEREVATVEKA QSLAALFLEK — VQGFDDFLWKAEAKEL 

: : I I I : I : : 1 III III I : : : 1 1 
LAWPGREIESRDRIAKICDNFAISKVARDMEQLLATKNLEKPLEQPENGYTYKAT 


368 


Db 


353 


408 


Qy 


369 


NTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSL 

|| :: | : :: : : : :::: 
WFMQFRAVLWRSWLSVLKEPLLVKVRLIQTTMVAI 


428 


Db 


409 


443 


Qy 


429 


IIGFLYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTA 


488 



: I I =: I 



Db 



444 L I GL I FLGQQLTQVGVMN I NGAI FL FLTNMT FQNVFAT I NVFT S EL P VFMREARS RL YRC 503 



Qy 489 GPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFL LWLWFCCRTM 544 

111:111 ::: I : I I : I I I I I I I : 

Db 504 DTYFLGKTIAELPLFLTVPLVFTAIAYPMIGLR AGVLHFFNCLALVTLVANVSTSF 559 

Qy 545 ALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQ 604 

I : |: : I I II :| :: : hi II: h lh 

Db 560 GYLISCASSSTSMALSVGPPVIIPFLLFGGFFLNSGSVPVYLKWLSYLSWFRYANEGLLI 619 

Qy 605 IQF NGHLYTTQIGNFTFSILGDTMI SAMDLNSHPLYAIYLIVIGISYGFLF 655 

I : I : I II I : : I I I I II : I : : : I I 
Db 620 NQWADVEPGEISCTS-SNTTCPSSGKVILETLNFSAADL PLDYVGLAILIVS — FRV 673 

Qy 656 LYYLSLKL 663 

I I h h I 
Db 674 LAYLALRL 681 



RESULT 9 
T08934 

hypothetical protein F27G19.20 - Arabidopsis thaliana 
C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: ll-Jun-1999 #sequence_revision ll-Jun-1999 #text_change 17-Mar-2000 
C;Accession: T08934 

R;Bevan, M. ; Hilbert, H.; Braun, M. ; Holzer, E.; Brandt, A.; Duesterhoef t , A.; 

Bancroft, I.; Mewes, H.W. ; Mayer, K.F.X.; Lemcke, K. ; Schueller, C. 

submitted to the Protein Sequence Database, May 1999 

A; Reference number: Z16519 

A;Accession: T08934 

A;Molecule type: DNA 

A; Residues: 1-635 <BEV> 

A;Cross-references: EMBL : AL078467 ; GSPDB : GN00062 ; ATSP: F27G19 . 20 
A; Experimental source: cultivar Columbia; BAC clone F27G19 
C; Genetics : 

A; Gene: ATSP : F27G19 . 20 
A;Map position: 4 

A;Introns: 38/3; 253/1; 304/1; 414/3 

C; Super family: fruit fly white protein; ATP-binding cassette homology 

Query Match 18.0%; Score 629.5; DB 2; Length 635; 

Best Local Similarity 28.2%; Pred. No. 5.8e-41; 

Matches 194; Conservative 127; Mismatches 261;. Indels 107; Gaps 23; 

Qy 23 LQDSLFSSESDNSLYFTYSGQSN TLEVRDLT YQVDIASQVPWFEQLAQFKI PWRSHS 7 9 

:: : : I II I: ::| lh :| I I : I 
Db 10 VETPIAKTNDDRSLPFSIFKKANNPVTLKFENLVYTVKLKDSQGCF G 56 

Qy 80 SQDSCE — LGI RNLS FKVRSGQMLAI I GS SGCGRASLLDVITGRGHGGKMK- SGQIWING 136 

I I :: h I : h : I h : I I I h I I I : II II I :| I I 

Db 57 KNDKTEERTILKGLTGIVKPGEILAMLGPSGSGKTSLLTALGGRVGEGKGKLTGNISYNN 116 

Qy 137 QPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELR 196 

: I : : I: : I I I I I I I I I M I I I : I I I : I : : : h : h I I 

Db 117 KPLS-KAVKRTTGFWQDDALYPNLTVTETLVFTALLRLPNSFKKQEKIKQAKAVMTELG 175 

Qy 197 LRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLS 256 



I • I * I • I I I I I I I I I • I I I i i • • i ii • i m I i i I i i i r ii vi- i 

Db 176 LDRCKDTIIGGPFLRGVSGGERKRVSIGQEILINPSLLFLDEPTSGLDSTTAQRIVSILW 235 

Qy 257 Rl^GNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGH-PCPRYS 315 

I I : I I I : : : I I I : I I : I I : I I I : I : I 

Db 236 ELARGGRTWTTIHQP SKGNPVYFGLGSNAMDYFASVGYSPLVERI 281 

Qy 316 NPADFYVDLT SIDRRSKEREVATVEKAQSLAALF LEKVQG 355 

I I : I I : I : I : : I I I I : I I : : : I : I 

Db 282 NPSDFLLDIANGKPLLVISCWPSVGSDESQRPEAM — KA-ALVAFYKTNLLDSVINEVKG 338 

Qy 356 FDDFLWK-AEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQIS NDF 410 

II I I: : |:|: II :|l |::| : : I 

Db 339 QDDLCNKPRESS RVATNT Y GDWPTT WWQQFCVLLKRGLKQRRHDSF 384 

Qy 411 RDLPTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKC 470 

: : : : : | : | | : : : I I INI: I : : 

Db 385 SGMKV AQIFIVSFLCGLLWWQTKISRL — QDQIGLLFFISSFWAFFPLFQQIFTF 437 

Qy 471 HSERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLH 530 

||:|| I |:| III ::::|:M : : ||: | I : 

Db 438 PQERAMLQKERSSGMYRLSPYFLSRWGDLPMELILPTCFLVITYWMAGLNHNLANFFVT 497 

Qy 531 FLLWLVVFCCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWIS 590 

I:: : I : II |:: :: : : :| I |: : ::| :|| 

Db 498 LLVLLVHVLVSGGLGLALGALVMDQKSATTLGSVIMLTFLLAGGYYVQHVPVFI — SWIK 555 

Qy 591 KLSFLRWCFSGLMQIQF NGHLYTTQI GNFTFS I LGDTMI SAMDLNSHPLY 640 

: I : : | : | : III : I : I I : I I : 

Db 556 YVSIGYYTYKLLILGQYTANELYPCGDNGKL-RCRVGDF EGIKHIGFNSGLVS 607 

Qy 641 AIYLIVIGISYGFLFLYYLSLKLI-KQKS 668 

I : I : : I : I : : I I I I I 
Db 608 ALALTAMLWY — RVIAYIALTRIGKTKS 634 



RESULT 10 
T47648 

ABC transporter-like protein - Arabidopsis thaliana 

N;Alternate names: protein T15C9.80 

C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 20-Apr-2000 #sequence_revision 20-Apr-2000 #text_change 19-May-2000 
C;Accession: T47648 

R;Mewes, H.W.; Rudd, S.; Lemcke, K.; Mayer, K.F.X* 
submitted to the Protein Sequence Database, April 2000 
A; Reference number: Z24470 
A; Accession: T4764 8 
A; Status: preliminary 
A;Molecule type: DNA 
A; Residues: 1-720 <MEW> 
A;Cross-references: EMBL : AL132970 

A; Experimental source: cultivar Columbia; BAC clone T15C9 
C; Genetics : 
A;Map position: 3 
A;Note: T15C9.80 

C; Superf amily : Arabidopsis thaliana probable ATP-binding cassette protein 
F12L6.1; ATP-binding cassette homology 



Query Match 17.2%; Score 601.5; DB 2; Length 720; 

Best Local Similarity 24.1%; Pred. No. l.le-38; 

Matches 178; Conservative 136; Mismatches 301; Indels 123; Gaps 18; 



Qy 14 GTVLQDASGLQDSLFSSESDNSLYFTYSGQS NTLEVRDLTYQVDIAS 60 

I : I : : I : : I : : I II I : I I I I : 

Db 11 GQLLKNVS DVRKVEVGDET PVHEFFDRDGS S LDGDNDHLMRPVP FVLS FNNLT YNVS VRR 7 0 

Qy 61 QVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 120 

: : : : : I I I I : I : I : I : I I : : I I : : I : I I I : : : I : I : 

Db 71 KLDFHD LVPWRRTSFSKTKTL-LDNISGETRDGEILAVLGASGSGKSTLIDALAN 124 

Qy 121 RGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFS 18 0 

I | : | | : : | | : : : : : I : I I I I I I I I I II I I : I I I I : 
Db 125 RI7VKGSLK-GTVTLNGEALQSRMLKVISAYVMQDDLLFPMLTVEETLMFAAEFRLPRSLP 183 

Qy 181 QAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPT 240 

: : : : II : : I : I : I I I : I : I I : I I I I I I I I I I I : : : : I : I I I I I I 
Db 184 KSKKKLRVQALIDQLGIRNAAKTIIGDEGHRGISGGERRRVSIGIDIIHDPIVLFLDEPT 243 

Qy 241 S GLDS FTAHNL VTTLS RLAKGNRLVLI S LHQPRS DI FRLFDLVLLMT S GT P I YLGAAQQM 300 

11111:1 : I I I : I : : : : : I : I I I : I I : : : : I : : I : : 
Db 244 SGLDSTSAFMWKVLKRIAESGSIIIMSIHQPSHRVLSLLDRLIFLSRGHTVFSGSPASL 303 

Qy 301 VQ YFT S I GHPC PRYSNPADFYVDLT S I DRRS KEREVATVEKAQS LAALFLEKVQGFDDF- 359 

: I I : I I I : I : I I : : : I : I : I 

Db 304 PSFFAGFGNPIPENENQTEFALDL IRELEGSAG GTRGLVEFN 345 

Qy 360 -LWKAEAKELNTSTHT VSLTLTQDTDC GTAVEL 391 

I : I : I I I : I I I : I : : 

Db 346 KKWQEMKKQSNPQTLT P PAS PN PNLTLKEAI SAS I S RGKLVS GGGGGS S VINHGGGTLAV 405 

Qy 392 PGMIEQF STLIRRQISNDFRDLPTLLIHGSEACLMSLIIGFLYY GHGAKQ- 441 

I I I I II I I I I : : : I : ::: I :: 

Db 406 P AFAN P FW I E I KT LT RRS I LN S RRQ P E LL GMRLAT VI VT G F I LAT VFWRLDN S P KGVQ E R 4 65 

Qy 442 LSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAGPYFF7VKILGELP 501 

II I I I : I I : I I | : : I 

Db 466 LGF FAFAMSTMFYTCADALPVFLQERYI FMRETAYNAYRRS S YVLSHAI VTFP 518 

Qy 502 EHCAYVI I YAMP I YWLTNLRPVPELFLLHFLLVWLVVFCCRTMALAASAMLPT FHMS S FF 561 

: : I : : I I I I : I : : : : I : : I : 

Db 519 SLIFLSIAFAWTFWAVGLEGGLMGFLFYCLIILASFWSGSSFVTFLSGWPHVMLGYTI 578 

Qy 562 CNALYNS FYLTAGFMI NLDNL WIVPAWISKLSFLRWCFSGLMQIQFNG 609 

I: I I :M II I : II I II ::: : ::| :|: 
Db 579 WAILAYFLLFSGFFINRDRIPQYWI WFHYLSLVKYPYEAVLQNEFSDPTECFVRGV 635 

Qy 610 HLY-TTQIGNFTFSI LGDTMI SAMDLNSHPLYAI YLI 645 

| : : : I I : : I :: : : I I 

Db 636 QLFDNSPLGELTYGMKLRLLDSVSRSIGMRISSSTCLTTGADVLKQQGVTQLSKWNCLLI 695 

Qy 646 VIGISYGFLFLYYLSLKL 663 

: I : I hllll 
Db 696 TVGFGFLFRI LFYLCLLL 713 



RESULT 11 
B96573 

protein F12M16.17 [imported] - Arabidopsis thaliana 
C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 02-Mar-2001 #sequence_revision 02-Mar-2001 #text_change 23-Mar-2001 
C /Accession: B96573 

R;Theologis, A.; Ecker, J.R.; Palm, C.J.; Federspiel, N.A. ; Kaul, S.; White, 0.; 
Alonso, J.; Altaf, H.; Araujo, R. ; Bowman, C.L.; Brooks, S.Y.; Buehler, E . ; 
Chan, A.; Chao, Q.; Chen, H.; Cheuk, R.F.; Chin, C.W. ; Chung, M. K. ; Conn, L.; 
Conway, A.B.; Conway, A.R.; Creasy, T.H.; Dewar, K. ; Dunn, P.; Etgu, P.; 
Feldblyum, T.V.; Feng, J.; Fong, B. ; Fujii, C.Y.; Gill, J.E.; Goldsmith, A.D.; 
Haas, B.; Hansen, N.F.; Hughes, B.; Huizar, L. 
Nature 408, 816-820, 2000 

A;Authors: Hunter, J.L.; Jenkins, J.; Johnson-Hopson, C; Khan, S.; Khaykin, E.; 
Kim, C.J.; Koo, H.L.; Kremenetskaia, I.; Kurtz, D.B.; Kwan, A.; Lam, B.; Langin- 
Hooper, S.; Lee, A.; Lee, J.M. ; Lenz, C.A. ; Li, J.H.; Li, Y. ; Lin, X.; Liu, 
S.X.; Liu, Z.A-; Luros, J.S.; Maiti, R. ; Marziali, A.; Militscher, J.; Miranda, 
M. ; Nguyen, M. ; Nierman, W.C; Osborne, B.I.; Pai, C; Peterson, J.; Pham, P.K.; 
Rizzo, M. ; Rooney, T . ; Rowley, D.; Sakano, H. 

A;Authors: Salzberg, S.L.; Schwartz, J.R.; Shinn, P.; Southwick, A.M.; Sun, H. ; 
Tallon, L.J.; Tambunga, G. ; Toriumi, M.J.; Town, CD.; Utterback, T.; van Aken, 
S.; Vaysberg, M. ; Vysotskaia, V.S.; Walker, M. ; Wu, D. ; Yu, G. ; Fraser, CM.; 
Venter, J.C; Davis, R.W. 

A; Title: Sequence and analysis of chromosome 1 of the plant Arabidopsis. 

A;Reference number: A86141; MUID : 21016719; PMID : 11130712 

A;Accession: B96573 

A; Status : preliminary 

A; Molecule type: DNA 

A; Residues: 1-590 <ST0> 

A; Cross-references: GB:AE005173; NID : g7769856; PIDN : AAF69534 . 1 ; GSPDB : GN00141 

C; Genetics : 

A; Gene: F12M16.17 

A; Map position: 1 

C;Superfamily: fruit fly white protein; ATP-binding cassette homology 

Query Match 17.0%; Score 595.5; DB 2; Length 590; 

Best Local Similarity 27.0%; Pred. No. 2.4e-38; 

Matches 172; Conservative 123; Mismatches 252; Indels 89; Gaps 17; 

TLEVRDLTYQVDIASQVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAI 103 

| | : : | : | : | || I I : :::::: I I I : : I I 

RLETKNLSYR — IGGNTPKFSNLCGL LSEKEEKVILKDVSCDARSAEITA1 66 



I I I I : : I I : : : I : I I : I I I : : I I : I I I I I I I I I I 



Qy 


44 


Db 


16 


Qy 


104 


Db 


67 


Qy 


164 


Db 


126 


Qy 


224 


Db 


184 



I : I I I :|: :| II I |::|:| hllllllllll 

1ALLRLKT KRKDAA — AKVKRL I QELGLEHVADS RI GQGS RS GI S GGERRRVS I 183 



I I : I : : I : : : : I I I I I I I I I : I : I I I : I :::::: I I 



Qy 



283 VLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLT-SID — RRSKEREVATV 339 



::|:::| : h : I II II I : : : I : I * : I II:: 

Db 244 IVLLSNGMWQNGSVYSLHQKIKFSGHQIPRRVNVLEYAIDIAGSLEPIRTQSCREISCY 303 

Qy 340 EKAQSLAALFLEKVQGFDDFLWK AEAKELNTS-THTVSLTLTQDTDCGTAVELPGM 394 

: : : II : I I : I : I : I = 
Db 304 GHSKT WKSCYI SAGGELHQSDSHSNS V 330 

Qy 395 I EQFSTLI RRQI SNDFRDLPTLLIHGSEACLMS LI I GFLY- - YGHGAKQLS FMDTAALLF 452 

: | : | : I III : I : I I : I : I I : I : : I I 

Db 331 LEEVQILGQRSCKNI FRTKQLFTTRALQAS IAGLI LGS I YLNVGNQKKEAKVLRTGFFAF 390 

Qy 453 MIGALIP FNVILDWSKCHSERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYV 507 

: : | : : I : I : I I I I I I : I 

Db 391 ILTFLLSSTTEGLPIFL QDRRI LMRET S RRAYRVLS YVLADTLI FI PFLLI I S 443 

Qy 508 IIYAMPIYWLTNLRPVPELFLLHFLLVWLWFCCRTMALAASAMLPTFHMSSFFCNALYN 567 

: : : I I : I I I II : I I | : : I : I : : I I : : I I I : : I 

Db 444 MLFATPVYWLVGLRRELDGFLYFSLVIWIVLLMSNSFVACFSALVPNFIMGTSVISGLMG 503 

Qy 568 SFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQIQFNGHLYTTQIGNFTFSILGDT 627 

||:| :|: I I : : :: II :: I II :: M 
Db 504 SFFLFSGYFIAKDRIPVYWEFMHYLSLFKYPFECLMINEYR GDV 547 

Qy 628 MI SAMDLNSHPLYA IYLIVIGIS-YGFLFLYY 658 

: || : : : : : I I I I : I 

Db 548 FLKQQDLKESQKWSNLGIMASFIVGYRVLGFFILWY 583 



RESULT 12 
T47652 

ABC transporter-like protein - Arabidopsis thaliana 

N;Alternate names: protein T26I12.10 

C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 20-Apr-2000 #sequence_revision 20-Apr-2000 #text_change 19-May-2000 
C;Accession: T47652 

R;Monfort, A.; Casacuberta, E.; Puigdomenech, P.; Mewes, H.W. ; Lemcke, K. ; 

Mayer, K.F.X.; Quetier, F.; Salanoubat, M. 

submitted to the Protein Sequence Database, February 2000 

A; Reference number: Z24471 

A; Accession: T47 652 

A; Status: preliminary 

A; Molecule type: DNA 

A; Residues: 1-725 <MON> 

A; Cross-references: EMBL: AL132954 

A; Experimental source: cultivar Columbia; BAC clone T26I12 
C; Genetics : 
A;Map position: 3 
A;Note: T26I12.10 

C;Superfamily: Arabidopsis thaliana probable ATP-binding cassette protein 
F12L6.1; ATP-binding cassette homology 

Query Match 16.7%; Score 582.5; DB 2; Length 725; 

Best Local Similarity 27.6%; Pred. No. 3.2e-37; 

Matches 169; Conservative 126; Mismatches 238; Indels 79; Gaps 19; 

Qy 62 VPW FEQLAQFKI PWR SHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLL 115 

||: I I I: : I | |: : : ::| : I :||::|:|l 



Db 69 VPYVLNFNNL-QYDVTLRRRFGFSRQNGVKTLLDDVSGEASDGDILAVLGASGAGKSTLI 127 



Qy 116 DVITGRGHGGKMKSGQIWINGQPSTPQLVRKCV-AHVRQHDQLLPNLTVRETLAFIAQMR 174 

| : | | | : : | : : M : : I : I : I I I I I I I I : I I I I : : I 

Db 128 DALAGRVAEGSLR-GSVTLNGEKVLQSRLLKVISAYVMQDDLLFPMLTVKETmFASEFR 186 

Qy 175 L P RT F S QAQ RD KRVEDVT AE L RL RQ CAN T RVGNT YVRGVS GG E RRRVS I GVQ LLWN P GI L 2 34 

| | | : I : : : : : I M : I : I I I I I I : I : II I I I I I I I I II I I : : : : I : I 
Db 187 LPRS LS KS KKMERVEALI DQLGLRNAANTVI GDEGHRGVS GGERRRVS I GI DI I HDPI VL 246 

Qy 235 ILDEPTSGLDSFTAHNLVTTLSRLAKGNRLVTilSLHQPRSDIFRLFDLVLLMTSGTPIYL 294 

I M I I I I I II I : I I I : I : : I : : I : I I I : I I I : : : : : I : : 
Db 247 FLDEPTSGLDSTNAFMWQVLKRIAQSGSIVIMSIHQPS7VRIVELLDRLIILSRGKSVFN 306 

Qy 295 GAAQQMVQ Y FT S I GH P C P RY SN PAD F YVDLT SID RRS KEREVAT VE KAQ S LAAL FLE KVQ 354 

| : : : I : I I I I :: I : I I II I I : III 
Db 307 GSPASLPGFFSDFGRPI PEKENI SEFALDLVRELEGSNEGTKALVD FNEK — 356 

Qy 355 GFDDFLWK AEAKELNTSTHTVSLTLTQDTDCG TAVEL- 391 

I: |:| ||:| : : |::| 

Db 357 WQQNKI S L I QSAPQTNKLDQDRS LS LKEAINASVS RGKLVS GS SRSNPT SMETV 410 

Qy 392 PGMI EQFSTLI RRQI SNDFR- DLPTLLIHGSEACLMSLIIGFL-YYGHG 438 

| : I I I : I : I I : I : : : I I I : : : I : I 
Db 411 S S YAN P S L FET F- 1 LAKRYMKNWI RMP ELVGT RI AT VMVT G CLLATVYWKLDHT PRG 4 66 

Qy 439 AKQLSFMDTAALLFMIGALIP — FNVI LDWS KCHS ERSMLYYELEDGLYTAGP YFFAKI 4 96 

I : : | : : : | | III I I : I I I : 

Db 4 67 AQE RLTLFAFWPTMFYCCLDNVPVFIQERYI FLRETTHNAYRTSS YVI SHS 518 

Qy 497 LGELPEHCAWIIYAMPIYWLTNLRPVPELFLLHFLLVWLWFCCRTMALAASAMLPTFH 556 

I ||: | :::: :| I I |: : I I:: : :: I ::l 

Db 519 LVSLPQLLAPSLVFSAITFWTVGLSGGLEGFVFYCLLIYASFWSGSSWTFISGWPNI- 577 

Qy 557 MSSFFCNALYNSF-YLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQIQFN — GHLYT 613 

| : : I : : I : I I : I I : I : I I : : : : : : I : : 

Db 578 MLCYMVSITYl^YCLLLSGFYVNRDRIPFYWTWFHYISILKYPYEAVLINEFDDPSRCFV 637 

Qy 614 TQI GNFTFS I LG 625 

: I : : I I 
Db 638 RGVQVFDSTLLG 649 



RESULT 13 
B88474 

protein C05D10.3 [imported] - Caenorhabditis elegans 
C; Species: Caenorhabditis elegans 

C;Date: 10-May-2001 #sequence_revision 10-May-2001 #text_change 15-Jun-2001 
C; Accession: B8 8474 

R; anonymous, The C. elegans Sequencing Consortium. 
Science 282, 2012-2018, 1998 

A; Title: Genome sequence of the nematode C. elegans: a platform for 
investigating biology. 

A; Reference number: A75000; MUID : 99069613 ; PMID: 9851916 
A;Note: see websites genome.wustl.edu/gsc/C_elegans/ and 
www_sanger.ac.uk/Projects/C__elegans/ for a list of authors 



A;Note: published errata appeared in Science 283, 35, 1999; Science 283, 2103, 

1999; and Science 285, 1493, 1999 

A; Accession: B8 8474 

A; Status: preliminary 

A;Molecule type: DNA 

A; Residues: 1-559 <STO> 

A;Cross-references: GB:chr_III; PIDN:AAA20989 . 1 ; PID:g532111; GSPDB : GN00021 ; 
CESP:C05D10.3 

A;Note: similar to D. melanogaster white protein 

C; Genetics : 

A; Gene: C05D10.3 

A;Map position: 3 

C'Superfamily: fruit fly white protein; ATP-binding cassette homology 

Query Match 16.6%; Score 581.5; DB 2; Length 559; 

Best Local Similarity 28.9%; Pred. No. 2.7e-37; 

Matches 155; Conservative 103; Mismatches 235; Indels 43; Gaps 10; 

Qy 88 IRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKC 147 

: I : I | | : : | | | : | | | | | : : | : : | : | I I I I : I : : : I : 

Db 10 LHNVS GMAESGKLLAI LGS SGAGKTTLMNVLT S RNLTNLDVQGS I LI DGRRANKWKI REM 69 

Qy 148 VAH VRQ H DQ L L PN LT VRET LAF I AQMRL - P RT F S QAQ RD KRVE D VI AE L RL RQ CANT RVG 206 

I I : I I I : :| || I |:|::|: : :| :| I I I I : : : I : : I I : I : I 
Db 70 SAFVQQHDMFVGTMTAREHLQFMARLRMGDQYYSDHERQLRVEQVLTQMGLKKCADTVIG 129 

Qy 207 -NTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTMNLVTTLSRLAKGNRLV 265 

::|:| ||::|:| :M III II I I I I I I : I I :: I I I I I 
Db 130 IPNQLKGLSCGEKKRLSFASEILTCPKILFCDEPTSGLDAFMAGHWQALRSLADNGMTV 189 

Qy 266 LISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLT 325 

:|::| I I I :: I I: I I I I II II II I hill I MM : 
Db 190 IITIHQPSSHVYSLFNNVCLMACGRVIYLGPGDQAVPLFEKCGYPCPAYYNPADHLI 246 

Qy 326 SIDRRSKEREVATVEKAQSLAALFLEKV-QGF DDFLWKAEAKELN TSTH 373 

| : | : : : : : : I : I II I I M I : 

Db 247 RTLAVIDSDRATSMKTISKIRQGFLSTDLGQSVLAIGNANKLRAASFVTGSD 298 

Qy 374 TVSLTLT QDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLII 430 

I || II: I I I I M I : : : : I 

Db 299 TSEKTKTFFNQDYNA SFWTQFLALFWRSWLTVI RDPNLLSVRLLQI LITAFIT 351 

Qy 431 GFLYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAGP 490 

| ::: : : ::| : I : I :| :: | :|:| 

Db 352 GIVFFQTPVTPATIISINGIMFNHIRNMNFMLQFPNVPVITAELPIVLRENANGVYRTSA 411 

Qy 491 YFFAKILGELPEHCAWIIYAMPIYWLTNLRPVPELFLLHFLLVWLWFCCRTMALAASA 550 

M I I : I I I : : Ml : I I : : I I : : I I : I : : I 

Db 412 YFLAKNIAELPQYIILPILYNTIVYWMSGLYP N FWN Y C FAS L VT I L I T NVAI S I S Y 467 

Qy 551 MLPTFHMSSFFCNALYNSFYLT AGFMINLDNLWIVPAWISKLSFLRWCFSGL 602 

: | :: : I : Mil: Ml IM :: : I 

Db 468 AVATIFANTDVAMTILPIFWPIMAFGGFFITFDAIPSYFKWLSSLSYFKYGYEAL 523 



RESULT 14 
T47650 



ABC transporter-like protein - Arabidopsis thaliana 

N;Alternate names: protein T15C9.110 

C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 20-Apr-2000 #sequence_revision 20-Apr-2000 #text_change 19-May-2000 
C; Accession: T47 650 

R;Mewes, H.W.; Rudd, S.; Lemcke, K. ; Mayer, K.F.X. 

submitted to the Protein Sequence Database, April 2000 

A; Reference number: Z24470 

A; Accession: T47 650 

A; Status : preliminary 

A; Molecule type: DNA 

A; Residues: 1-708 <MEW> 

A; Cross-references : EMBL : AL132970 

A; Experimental source: cultivar Columbia; BAC clone T15C9 
C; Genetics : 
A;Map position: 3 
A;Note: T15C9.110 

C;Superfamily: Arabidopsis thaliana probable ATP-binding cassette protein 
F12L6.1; ATP-binding cassette homology 

Query Match 16.6%; Score 580.5; DB 2; Length 708; 

Best Local Similarity 26.8%; Pred. No. 4.5e-37; 

Matches 185; Conservative 123; Mismatches 287; Indels 95; Gaps 19; 

Qy 40 YSGQ-SNTL E VRD LT YQ VD I AS Q VP W FEQLAQFKI PWR SHSSQDSCELGIRNL 91 

I I : : I : I I : I I I : I I : : I I I : : : : 

Db 37 YPGENAPTQHILDIAPAAETRS-VPFLLSFNNLSYNVVLRRRFDFSRRKTASVKTLLDDI 95 

Qy 92 SFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKCV-AH 150 

: : I I : : I I : : I I I I : : : I : I : I I : I I : : I I : : I : I : 

Db 96 TGEARDGEILAVLGGSGAGKSTLIDA1AGRVAEDSLK-GTVTLNGEKVLQSRLLKVISAY 154 

Qy 151 VRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRVGNTYV 210 

| | | | I 111:111 I :: MM: :::: :IM =1 :| M 1 = 1 = I s 
Db 155 VMQDDLLFPMLTVKETLMFASEFRLPRSLPKSKKMERVETLIDQLGLRNAADTVIGDEGH 214 

Qy 211 RGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLVLISLH 270 

I I I || I I II I I I I I : : : : I : I I I I I I I I I II I : I I I ' I : : I :: I : I 

Db 215 RGVSGGERRRVSIGIDIIHDPILLFLDEPTSGLDSTNAFMWQVLKRIAQSGSWIMSIH 274 

Qy 271 QPRSDIFRL FDLVL LMT S GT P I YLGAAQQMVQ Y FT S I GH P C P RYS N PAD F YVDLT SID RR 330 

| | : | | | : : : : : | : : I : : : I : I I I I I : I : I : 
Db 275 QPSARI I GLLDRLI I LSHGKSVFNGS PVSLPS FFS S FGRPI PEKENITEFALDVI RELEG 334 

Qy 331 SKEREVATVEKAQSLAALFLEKVQGFDDFLWKAE7^KELNTSTHTVSL TL 379 

II II I II I : I : I I I I 

D b 335 SSEGTRDLVE FNEK WQQNQTARATTQSRVSLKEAIAASVSRGKL 378 

Qy 380 TQDTDCGTAVEL PGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIG 431 

: : : I : I I I :| I I I : : : I:: 

Db 379 VSGSSGANPISMETVSSYANPPLAETF-ILAKRYIKNWIRT 437 

Qy 432 FLYY GHGAKQ- L S FMDT AALL FMI GAL I P FNVT LDVVS KCH S ERSML Y YELEDGL 485 

: I : I I :: : I II I I : I I : I 

Db 438 TVYWRLDNTPRGAQERMGF FAFGMSTMFYCCADNIPVFIQERYIFLRETTHNA 490 



Qy 



486 YTAGPYFFAKILGELPEHCAWIIYAMPIYWLTNLRPVPELFLLHFLLWLWFCCRTMA 545 



I I : I I I : I I : I : I I | | : I ::: : :: 

Db 491 YRTSSYVISHALVSLPQLLALSIAFAATTFWTVGLSGGLESFFYYCLIIYAAFWSGSSIV 550 

Qy 54 6 LAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRW 597 

I :: I II. I : I Mill:: | : I I :: 
Db 551 TFISGLIPNVMMSYMVTIAYLSYCLLLGGFYINRDRIPLYWIWFHYISLLKYPYEAVLIN 610 

Qy 598 CFSGLMQIQFNG HLYTTQI GNFTFS I LGDTMI SAMDLNSHP 638 

|| : I : I : I I : : : : I I : : I : I 

Db 611 EFDDPSRCFVKGVQV-FDGTLLAEVSHVMKVKLLDTLSGSLGTKITESTCLRTGPDLLMQ 669 

Qy 639 LYAIYLIVTGISYGFLF — LYYLSL 661 

I : I : : : I I I : I I I I 

Db 67 0 QGITQLSKWDCLWITLAWGLFFRILFYLSL 699 



RESULT 15 
A84509 

probable ABC transporter [imported] - Arabidopsis thaliana 
C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 02-Feb-2001 #sequence_revision 02-Feb-2001 #text_change 16-Feb-2001 
C; Access ion: A84509 

R;Lin, X.; Kaul, S.; Rounsley, S.D.; Shea, T.P.; Benito, M.I.; Town, CD.; 
Fujii, C.Y.; Mason, T.M.; Bowman, C.L.; Barnstead, M.E.; Feldblyum, T.V.; Buell, 
CR.; Ketchum, K.A. ; Lee, J.J. ; Ronning, CM.; Koo, H. ; Moffat, K.S.; Cronin, 
L.A.; Shen, M. ; VanAken, S.E.; Umayam, L.; Tallon, L.J.; Gill, J.E.; Adams, 
M.D.; Carrera, A.J.; Creasy, T.H.; Goodman, H.M. ; Somerville, C.R.; Copenhaver, 
G.P.; Preuss, D.; Nierman, W.C; White, 0.; Eisen, J. A.; Salzberg, S.L.; Fraser, 
CM.; Venter, J.C 
Nature 402, 761-768, 1999 

A; Title: Sequence and analysis of chromosome 2 of the plant Arabidopsis 
thaliana. 

A;Reference number: A84420; MUID: 20083487; PMID: 10617197 
A;Accession: A84509 
A; Status: preliminary 
A; Molecule type: DNA 
A; Residues: 1-649 <STO> 

A;Cross-references: GB:AE002093; NID: g4558665 ; PIDN : AAD22683 . 1 ; GSPDB : GN00139 

C; Genetics : 

A; Gene: At2gl3610 

A;Map position: 2 

C;Superfamily: Arabidopsis thaliana probable ATP-binding cassette protein 
F12L6.1; ATP-binding cassette homology 

Query Match 16.6%; Score 57 9.5; DB 2 ; Length 649; 

Best Local Similarity 27.5%; Pred. No. 4.8e-37; 

Matches 167; Conservative 121; Mismatches 240; Indels 79; Gaps 17; 

Qy 88 IRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKC 147 

:::::: : : I I I : I I I I : : I II : : : I : : | : : : I : I : I 

Db 63 LKGVTCRAKPWEILAIVGPSGAGKSSLLEILAAR LI PQTGSVYVNKRPVDRANFKKI 119 

Qy 148 VAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRVGN 207 

: I I I I I I I I II I I I : : I I : : I I : : : I I I I I I I : 

Db 120 SGYVTQKDTLFPLLTVEETLLFSAKLRLKLPADELR — SRVKSLVHELGLEAVATARVGD 177 



Qy 



208 T YVRGVS G GE RRRVS I G VQL LWN PGILILDEPTSGLDS FT AHN L VT T L S RLAK- GN RL VL 266 



II I : M I I I I I I I I I I : : : : I : M I I I I I I I I i I : I :: I :|: I :: 

Db 178 DSVRGISGGERRRVSIGVEVIHDPKVLILDEPTSGLDSTSALLIIDMLKHM7VETRGRTII 237 

Qy 267 ISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIG-HPCPRYSNPADFYVD— 323 

:::||| | : |: III: :|: : |: |: I I I II I : I :| :: 
Db 238 LTIHQPGFRIVKQFNSVLLLANGSTLKQGSVDQLGVYLRSNGLHP-PLHENIVEFAIESI 296 

Qy 324 — LT S I DRRS KEREVATVEKAQS LAALFLEKVQGFDDFLWKAEAKELNT STHTVS LTL — 379 

: I I : I I I I : | : : : : : : II 

Db 297 ESITKQQRLQESRRAAHVLTPQTT LQEKRSEDSQGESKSGKFTLQQ 342 

Qy 380 TQDTDCGT AVELP GMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMS 427 

I : I I I II : I : II III : 

Db 343 LFQQTRVADVGT1VINIATEFTRDFANSRLEETMILTHRFSKNIFRTKELFACRTVQMLGSG 402 

Qy 428 LIIGFLYYG HGAKQ LS FMDTAALLFMI GALI P FNVI LDWS KCHS ERSML Y 478 

: : : | : : : I I : : : I : I I I I I I I I : I 

Db 403 IVLGLI FHNLKDDLKGARERVGLFAFILTFLLTSTI EALPI F LQEREILM 452 

Qy 479 YELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLW 538 

I || I I I II I::: 1:111 I I II 

Db 453 KETSSGSYRVSSYAVANGLVYLPFLLILAILFSTPVYWLVGLNPSFiyiAFLHFSLLIWLIL 512 

Qy 539 FCCRTMALAASAMLPT FHMS S FFCNALYNS FYLTAGFMIN LDNLWIVPAWISKLSFL 595 

: :: : | | :: | I : : : : I I : I : I : I : : II :: : I 
Db 513 YTANSVWCFSALVPNFIVGNSVI SGVMGS FFLFSGYFISNHEI PGYWI FMHYISLF 569 

Qy 596 RWCFSGLMQIQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLY AIYLIVI 647 

:: I I : :|: : I | ::: || I : :: 

Db 570 KYPFEGFLINEFSKSNKCLEYG FGKCLVTEEDLLKEERYGEESRWRNWIMLCF 623 

Qy 64 8 GISYGFL 654 

: I I : 

Db 624 VLLYRFI 630 



Search completed: February 27, 2004, 07:18:53 
Job time : 15.9728 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2 004 Compugen Ltd. 



OM protein - protein search, using sw model 
Run on: February 27, 2004, 07:17:39 



; Search time 30.1994 Seconds 
(without alignments) 
4698.604 Million cell updates/sec 



Title: US-09-989-981A-4 
Perfect score: 3494 

Sequence: 1 MAEKTKEETQLWNGTVLQDA FLFLYYLSLKLIKQKSIQDW 672 



Scoring table: 



Searched: 



BLOSUM62 

Gapop 10.0 , Gapext 0.5 

809742 seqs, 211153259 residues 



809742 



Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 
Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 

Database : Published_Applications_AA: * 

1: /cgn2_6/ptodata/2/pubpaa/US07_PUBCOMB.pep:* 

2 : /cgn2_6/ptodata/2/pubpaa/PCT_NEW_PUB . pep : * 

3: /cgn2_6/ptodata/2/pubpaa/US06_NEW_PUB.pep:* 

4: /cgn2_6/ptodata/2/pubpaa/US06_PUBCOMB.pep:* 

5: /cgn2_6/ptodata/2/pubpaa/US07_NEW_PUB.pep:* 

6: /cgn2_6/ptodata/2/pubpaa/PCTUS_PUBCOMB.pep: 

7: /cgn2_6/ptodata/2/pubpaa/US08_NEW_PUB.pep:* 

8: /cgn2_6/ptodata/2/pubpaa/US08_PUBCOMB.pep:* 

9: /cgn2_6/ptodata/2/pubpaa/US09A_PUBCOMB.pep: 
10: /cgn2_6/ptodata/2/pubpaa/US09B__PUBCOMB.pep 
11: /cgn2_6/ptodata/2/pubpaa/US09C_PUBCOMB.pep 
12: /cgn2_6/ptodata/2/pubpaa/US09_NEW_PUB.pep: 
13: /cgn2_6/ptodata/2/pubpaa/US10A_PUBCOMB.pep 
14 : /cgn2_6/ptodata/2/pubpaa/US10B_PUBCOMB.pep 
15: /cgn2_6/ptodata/2/pubpaa/US10C_PUBCOMB.pep 
16: /cgn2_6/ptodata/2/pubpaa/US10_NEW_PUB.pep:* 
17: /cgn2_6/ptodata/2/pubpaa/US60_NEW_PUB.pep:* 
18 : /cgn2_6/ptodata/2/pubpaa/US60_PUBCOMB.pep: * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 



* 

: * 
: * 



Result Query 
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ALIGNMENTS 



RESULT 1 

US-09-989-981A-4 

; Sequence 4, Application US/09989981A 

; Publication No. US20030049730A1 

; GENERAL INFORMATION: 

; APPLICANT: Hobbs, Helen H. 



APPLICANT: Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 01878 1-007320US 
CURRENT APPLICATION NUMBER: US/09/989, 981A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS : 13 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 4 
LENGTH: 672 
TYPE: PRT 

ORGANISM: Mus mus cuius 
FEATURE: 

OTHER INFORMATION: mouse ABCG8 (mABCG8) 
US-09-989-981A-4 

Query Match 100.0%; Score 3494; DB 10; Length 672; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 672; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 



Qy 


1 


MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIAS 


60 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 II 1 I 1 1 1 1 1 II 1 1 1 




Db 


1 


MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIAS 


60 


Qy 


61 


QVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 


120 






1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 




Db 


61 


QVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 


120 


Qy 


121 


RGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFS 


180 






1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 




Db 
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RGHGGKMKSGQIWINGQPSTPQLVRKCVAHWQHDQLLPNLTWETLAFIAQMRLPRTFS 
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Qy 
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QAQ RD K RVEDVI AE LRL RQCANT RVGNT YVRGVS GGERRRVS I GVQLLWNP GI LI LDEPT 
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1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 
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QAQ RDKRVEDVIAELRLRQCANT RVGNT YVRGVS GGERRRVS I GVQLLWNP GIL I LDEPT 
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Qy 
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S GLDS FTAHNLVTTLS RLAKGNRLVLI S LHQPRS DI FRLFDLVLLMTS GT P I YLGAAQQM 


300 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 




Db 
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S GLD S FTAHN LVTT L S RLAKGN RLVL I S LHQ P RS D I FRL FDLVLLMT S GT P I YLGAAQQM 
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Qy 
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VQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFL 


360 






1 1 1 1 II 1 1 1 1 1 1 1 1 1 II 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 




Db 


301 


VQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFL 
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Qy 


361 


WKAEAKELNTSTHTVS LTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHG 
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1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


361 


WKAEAKELNTSTHTVS LTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHG 


420 



Qy 



421 SEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYYE 480 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I 



Db 421 SEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYYE 480 



Qy 481 LEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLWFC 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
Db 481 LEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLWFC 540 

Qy 541 CRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFS 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 541 CRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFS 600 

Qy 601 GLMQIQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYLS 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 601 GLMQIQFNGHLYTTQI GNFTFS I LGDTMI SAMDLNSHPLYAI YLI VI GI S YGFLFLYYLS 660 

Qy 661 LKLIKQKSIQDW 672 

I I I I I I I I I I I I 
Db 661 LKLIKQKSIQDW 672 



RESULT 2 

US-09-989-981A-8 

Sequence 8, Application US/09989981A 
Publication No. US20030049730A1 
GENERAL INFORMATION: 
APPLICANT: Hobbs, Helen H. 
APPLICANT: Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 018781-007320US 
CURRENT APPLICATION NUMBER: US/09/98 9, 981A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS: 13 
SOFTWARE: Patent In Ver. 2.1 
SEQ ID NO 8 
LENGTH: 673 
TYPE: PRT 

ORGANISM: Homo sapiens 
FEATURE : 

OTHER INFORMATION: human ABCG8 (hABCG8) 
US-09-989-981A-8 

Query Match 82.5%; Score 2883.5; DB 10; Length 673; 

Best Local Similarity 81.9%; Pred. No. 1.7e-266; 

Matches 551; Conservative 52; Mismatches 69; Indels 1; Gaps 1; 

Qy 1 MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIAS 60 

Ml || | I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I 
Db 1 MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLAS 60 



Qy 61 QVPWFEQLAQFKI PWRSHS SQDSCELGI RNLS FKVRSGQMLAI I GS SGCGRASLLDVITG 120 



Db 


61 


Qy 


121 


Db 


121 


Qy 


181 


Db 


181 


Qy 


241 


Db 


241 


Qy 


301 


Db 


301 


Qy 


361 


Db 


361 


Qy 


420 


Db 


421 


Qy 


480 


Db 


481 


Qy 


540 


Db 


541 


Qy 


600 


Db 


601 


Qy 


660 


Db 


661 



I I I I I I I I I I I I : I I I I I : I II I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 



RGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFS 180 

I I I I I I : I I I I I I I I I I I I : I I I I I I M I I I I I I : I I I I I I I I I I I I I I I M I II I I I I I 
RGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFS 180 

QAQ RDKRVE D VI AE LRL RQ CANT RVGNT YVRGVS GGE RRRVS I GVQL LWN PGILILDEPT 240 

II I I I I I II I I I I I I I II I II : I II I I I I I I : II I I I I I I I I I I I I I I I I I I I I I I I I I 

QAQ RD K RVE D VI AE LRL RQ CAD T RVGNM YVRG L S G G E R RRVS I G VQ L LWN PGILILDEPT 240 

SGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQM 300 

I I || I I || I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
SGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHM 300 

VQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFL 360 

I I I I I : I I : I I I I I I I I I I I I I I I I I I I I I I : I : I : I I I I I I I I I I I I I I I I : I I I I 
VQYFTAI GYPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDFL 360 

WKAEAKELNTSTHTVSLTLTQDTDC-GTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIH 419 

| | | | | : | : | I I I : I : : : I I : : I I : I I I I II I I I I II I I I I I M I 

WKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIH 420 

GSEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYY 479 
| : | | | | || : I I I I I : II I : I I I II I I I I I I I I II I I I I I I I I I I I : I I I : I I I : I II I 
GAEACLMSMTIGFLYFGHGSIQLSFMDTAALLFMI GAL I P FNVI LDVI S KCYS ERAML YY 480 

ELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLWF 539 

I I I I II I I I I I I I I I I I I I I I I I I I I : I I I II III I I I I : II I II I I I I I I I I 
ELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLWF 540 

CCRTMALAASAMLPT FHMS S FFCNALYNS FYLTAGFMINLDNLWIVPAWI S KLS FLRWCF 599 
Ml | || I I : I : I I I I I I : I I I I I I I I I I I I I II I I I : I I I I I I I I I : I I I I I I I 
CCRIMALAAAALLPT FHMAS FFSNALYNS FYLAGGFMINLS SLWTVPAWI SKVS FLRWCF 600 

SGLMQIQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYL 659 

|||:|||: I :M I :: I I : : I I I : I : I : I I I I I I I I I I I : I II: III: 
EGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYYV 660 

SLKLIKQKSIQDW 672 

I I : I I I I III 
SLRFIKQKPSQDW 673 



RESULT 3 
US-10-090-455-7 

; Sequence 7, Application US/10090455 

; Publication No. US20030027259A1 

; GENERAL INFORMATION: 

; APPLICANT: Chen, Hongyun 

; APPLICANT: Le Bihan, Stephane 

; TITLE OF INVENTION: NOVEL ABCG4 TRANSPORTER AND USES THEREOF 
; FILE REFERENCE: 100103.406 

; CURRENT APPLICATION NUMBER: US/10/090, 455 
; CURRENT FILING DATE: 2002-03-01 
; NUMBER OF SEQ ID NOS : 17 

SOFTWARE: FastSEQ for Windows Version 4.0 



; SEQ ID NO 7 

LENGTH: 673 

TYPE: PRT 
; ORGANISM: Homo sapiens 
US-10-090-455-7 



Query Match 82.4%; Score 2879.5; DB 14; Length 673; 

Best Local Similarity 81.7%; Pred. No. 4e-266; 

Matches 550; Conservative 52; Mismatches 70; Indels 1; Gaps 1; 



Qy 1 MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIAS 60 

Ml M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I 
Db 1 MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLAS 60 

Qy 61 QVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 120 

||||||||||||:ll I I 1:1 I I I 11:1 I M II I M I I I i II I M I I I I I I I I M I I I 
Db 61 QVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 120 

Qy 121 RGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFS 180 

I | | I I I : I I I I I I I I I I I I : I I I II I I I I I I I I I : I I I II I I I I I II I I I I I M M I I I I 
Db 121 RGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFS 180 

Qy 181 QAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPT 240 

I I I I I II I I I I I I I I I I I I I I : I I I II I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 QAQ RD KRVE DVI AEL RL RQ CADT RVGNMYVRG L S GGE RRRVS I GVQ LLWN PGILILDEPT 240 

Qy 241 SGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQM 300 

I I I I I I I I I I II I I I I I I I I I I I I I I I I M I II I I I I I I I I I I I I I I I I I I I M I I I I 
Db 241 SGLDSFTAHNLVKTLSRLAXGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHM 300 

Qy 301 VQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFL 360 

I I I M : I I : I I I I I I I II I I I I II I I I II I I : I : I : I I I I I I I I I I I I I I I I : I I I I 
Db 301 VQYFTAI GYPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDFL 360 

Qy 361 WKAEAKELNTSTHTVSLTLTQDTDC-GTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIH 419 

| | | | | : | : | | I I : I : : : I I : : I I : I I I I I I I I I I I I I I I I I I I I 

Db 361 WKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIH 420 

Qy 420 GSEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYY 479 

I : I I M I I : I I I I I : I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I : I I I : I I M 
Db 421 GAEACLMSMTIGFLYFGHGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYSERAMLYY 480 

Qy 480 ELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLWF 539 

MINIM I II I I M M I I I I M I I M M I II III II I I : I I 1 I I I I I I I I I I 
Db 4 81 ELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLWLWF 540 

Qy 540 CCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCF 599 

Ml I || I I : I : I I I I I I : I I I I I I I I I I I I I I I II I Ml I I I I I I I : I II I I I I 
Db 541 CCRIMALAAAALLPTFHMASFFSNALYNSFYl^GGFMINLSSLWTVPAWISKVSFLRWCF 600 

Qy 600 SGLMQIQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYL 659 

111:111: I :|| I :: II ::| I : I : I : I I I I I I I I M I : I lh III: 
Db 601 EGLMKIQFSRRTYKMPLGNLTIAVSGDKILSVMELDSYPLYAIYLIVIGLSGGFMVLYYV 660 



Qy 660 SLKLIKQKSIQDW 672 

I I : I I II III 
Db 661 SLRFI KQKPSQDW 673 



RESULT 4 
US-10-415-378-9 

Sequence 9, Application US/10415378 
Publication No. US20040014945A1 
GENERAL INFORMATION: 
APPLICANT: INCYTE CORPORATION; TANG, Y. Tom 
APPLICANT: YUE, Henry; NGUYEN, Danniel B.; 
APPLICANT: HAFALIA, April J.A. ; ELLIOTT, Vicki S.; 
APPLICANT: LU, Yan; CHAWLA, Narinder K. ; 
APPLICANT: YAO, Monique G. ; BAUGHN, Marian R. ; 
APPLICANT: GANDHI, Ameena R. ; DING, Li; 

APPLICANT: SANJANWALA, Madhusudan M. ; RAMKUMAR, Jayalaxmi; 
APPLICANT: ARVIZU, Chandra S.; GIETZEN, Kimberly J.; 
APPLICANT: LAL, Preeti G. ; AZIMZAI, Yalda; 
APPLICANT: KHAN, Farrah A. ; T HAN GAVE LU , Kavitha; 
APPLICANT: THORNTON, Michael B. ; LU, Dyung Aina M. ; 
APPLICANT: TRIBOULEY, Catherine M. ; WARREN, Bridget A. ; 
APPLICANT: ISON, H. Craig; DAS, Debopriya; 
APPLICANT: RAUMANN, Brigette E. ; POLICKY, Jennifer L. ; 
APPLICANT: KEARNEY, Liam 

TITLE OF INVENTION: TRANSPORTERS AND ION CHANNELS 
FILE REFERENCE: PI-0270 USN 
CURRENT APPLICATION NUMBER: US/10/415, 378 
CURRENT FILING DATE: 2003-05-07 
PRIOR APPLICATION NUMBER: PCT/US01/4 6055 
PRIOR FILING DATE: 2001-10-27 
PRIOR APPLICATION NUMBER: US 60/250,790 
PRIOR FILING DATE: 2000-12-01 
PRIOR APPLICATION NUMBER: US 60/252,232 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/249,661 
PRIOR FILING DATE: 2000-11-17 
PRIOR APPLICATION NUMBER: US 60/247,673 
PRIOR FILING DATE: 2000-11-09 
PRIOR APPLICATION NUMBER: US 60/245,904 
PRIOR FILING DATE: 2000-11-03 
PRIOR APPLICATION NUMBER: US 60/243,989 
PRIOR FILING DATE: 2000-10-27 
NUMBER OF SEQ ID NOS : 4 0 
SOFTWARE: PERL Program 
SEQ ID NO 9 
LENGTH: 37 4 
TYPE: PRT 

ORGANISM: Homo sapiens 
FEATURE: 

NAME/KEY: misc_feature 

OTHER INFORMATION: Incyte ID No. US20040014945A1 6585710CD1 
US-10-415-378-9 

Query Match 43.2%; Score 1508.5; DB 15; Length 374; 

Best Local Similarity 74.9%; Pred. No. 2.3e-135; 

Matches 280; Conservative 43; Mismatches 50; Indels 1; Gaps 



QY 



300 MVQYFTSIGHPCPRYSNPADFYvT)LTSIDRRSKEREVATVT:KAQS]J\ALFLEKVQGFDDF 
I I I I I : I I : I I I I I I I I I I I I I I I I I II I I I : I : I : I I I I I I I I I II I I I I I : III 



Db 1 MVHYFTAIGYPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDF 60 

Qy 360 LWKAEAKELNTSTHTVSLTLTQDTDC-GTAVELPGMIEQFSTLIRRQISNDFRDLPTLLI 418 

I I I I I I : I : I I I I : I : : : I I :: I I : I I I I I I I I I I M I I I I I I I 

Db 61 LWKAETKDLDEDTCVES SVTPLDTNCLPS PTKMPGAVQQFTTLI RRQI SNDFRDLPTLLI 120 

Qy 419 HGSEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLY 478 

I I : I I I I I I : I I I I I : I I I : I I I I I I I II I I I I I I I I II I I I I I I I : I I I : I I I : I M 
Db 121 HGAEACLMSMT I GFLYFGHGS IQLS FMDTAALLFMI GALI P FNVI LDVI S KCYS ERAMLY 180 

Qy 479 YELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLW 538 

I I I II I I I I I I I I I I II I I I I I I I I I I : I I I II III I I I I : I I I I I I I I I I I I 
Db 181 YELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLW 240 

Qy 539 FCCRTMAIAAS7\MLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWC 598 

I M I I I I I I : I : I I I I I I : I I I I I I I I I I I I I I I I I I Ml I I I I I I I : I I I II I 
Db 241 FCCRIMALAAAALLPT FHMAS FFSNALYNS FYLAGGFMINLS S LWTVPAWI S KVS FLRWC 300 

Qy 599 FSGLMQIQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYY 658 

I I M : I I I : I : I I I : : I I : : I I I : I : I : I I I I I I I I I I I : I I I : I I I 
Db 301 FEGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYY 360 

Qy 659 LSLKLIKQKSIQDW 672 

: I I : I I I I I II 
Db 361 VSLRFIKQKPSQDW 374 



RESULT 5 
US-09-837-992-1 

; Sequence 1, Application US/09837992 

; Patent No. US20020081687A1 

; GENERAL INFORMATION: 

; APPLICANT: Tian, Hui 

; APPLICANT: Schultz, Joshua 

; APPLICANT: Shan, Bei 

; APPLICANT: Tularik Inc. 

TITLE OF INVENTION: Sitosterolemia Susceptibility Gene (SSG) : Compositions 
; TITLE OF INVENTION: and Methods of Use 
; FILE REFERENCE: 018781-006020US 
; CURRENT APPLICATION NUMBER: US/09/837 f 992 
; CURRENT FILING DATE: 2001-04-18 
; PRIOR APPLICATION NUMBER: US 60/198,465 
; PRIOR FILING DATE: 2000-04-18 
; PRIOR APPLICATION NUMBER: US 60/204,234 
; PRIOR FILING DATE: 2000-05-15 
; NUMBER OF SEQ ID NOS : 45 
; SOFTWARE: Patentln Ver. 2.1 
; SEQ ID NO 1 
; LENGTH: 652 

TYPE: PRT 
; ORGANISM: Mus musculus 
FEATURE: 

; OTHER INFORMATION: mouse sitosterolemia susceptibility gene (SSG) 

; OTHER INFORMATION: amino acid sequence 

US-09-837-992-1 



Query Match 



20.1%; Score 701.5; DB 9; Length 652; 



Best Local Similarity 29.1%; Pred. No. 8.6e-58; 

Matches 194; Conservative 131; Mismatches 245; Indels 97; Gaps 19; 



Qy 24 QDSLFSSESDNS LYFTYSGQSNTLEVRDLTYQVDIASQV-PWFEQLAQFKIPWRSHS 79 

I I : : I : : I I : : I I : : : : | II I I 

Db 27 QGSVTGTEARHSLGVLHVSYS VSNRVGPW WNIKS 60 

Qy 80 SQDSCELGI-RNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQP 138 

I : I : : : I : I I I : : I : I I I I I : : I I I I : I I I : : : : I I 

Db 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

Qy 139 STPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLR 198 

: I : : I I I I : I I I I I I I : I : I I = I :|:|| |: M I 

Db 121 LRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRS-SADFYNKKVEAVMTELSLS 179 

Qy 199 Q CANT RVGNT YVRGVS GGERRRVS I GVQ L LWN PGILILDEPTSGLDS FT AHN LVT T LS RL 258 

I : : I : hi I II I I I : I : : : I I I I I : I I I II- M hi 

Db 180 HVADQMIGSYNFGGISSGERRRVSI7\AQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAEL 239 

Qy 259 AKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPA 318 

| : : | : | : : : : I I II I : : I : II : : : I I : : I : : h : I : hill : I I I 
Db 240 ARRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPF 299 

Qy 319 DFWDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLT 378 

I I I : I I I h I : h I I h I : : I I II I I : : — =1 

Db 300 DFYMDLTSVDTQSREREIETYKRVQMLECAFKE SDI YHKI-LENIERARYLKTLP 353 

Qy 379 L TQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGF — 432 

: |:| III: h I I lh ::: : : I I : I 

Db 354 MVPFKTKDP PGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYL 405 

Qy 433 LYYGHGAKQLS FMDTAALLFMI GALI P FNVI LDWS KCH S ERSML YYELEDGLYTAGP YF 492 

| : : : I I | : : h : h h I :: I : I I I I 

Db 406 L RVQNNT LKGAVQ D RVGLL YQLVGAT P YT GMLNAVN L F PML RAVS DQ E S Q DG L YH KWQML 465 

Qy 493 FAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELF LL — HFLLVWLWFCCRTM 544 

I : | || : I :: II II I | | | : : | h 
Db 466 LAYVLHVLPFSVIATVI FS SVCYWTLGLYPEVARFGYFSAALLAPHLI GEFL TL 519 

Qy 545 ALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQ 604 

| | ::| : : : II : h : I : :| ::| h 

Db 520 VLLGIVQNPNI-VNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILW 578 

Qy 605 IQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISY 651 

: I -I III : h : I I : I I h : 
Db 579 NEFYGL NFTCGGSNTSML-- NHPMCA 1 TQGVQ F I EKT C P GAT S RFT 622 

Qy 652 -GFLFLY 657 

I I I I 

Db 623 ANFLILY 629 



RESULT 6 

US-09-989-981A-2 

; Sequence 2, Application US/09989981A 
; Publication No. US20030049730A1 
; GENERAL INFORMATION: 



APPLICANT: Hobbs, Helen H. 
APPLICANT: Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 018781-007320US 
CURRENT APPLICATION NUMBER: US/09/989, 981A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS: 13 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 2 
LENGTH: 652 
TYPE: PRT 

ORGANISM: Mus mus cuius 
FEATURE: 

OTHER INFORMATION: mouse ABCG5 (mABCGS) 
US-09-989-981A-2 

Query Match 20.1%; Score 701.5; DB 10; Length 652; 

Best Local Similarity 29.1%; Pred. No. 8.6e-58; 

Matches 194; Conservative 131; Mismatches 245; Indels 97; Gaps 19; 

Qy 24 QDSLFSSESDNS LYFTYSGQSNTLEVRDLTYQVDIASQV- PWFEQLAQFKIPWRSHS 7 9 

I I : : I : : I I : : I I : : : : I I I I I 

Db 27 QGSVTGTEARHSLGVLHVSYS VSNRVGPW WNIKS 60 

Qy 80 SQDSCELGI-RNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQP 138 

| : | : : : I : I I I : : I : I I I I I : : I I I I : M |::::M 
Db 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

Qy 139 STPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLR 198 

: I : : I I I I : I I I I I I I : I : I I : I :|:ll h II I 

Db 121 LRRDQFQDCFSYVXQSDVFLSSLTVI^ETLRYTAMLALCRS-SADFYNK 179 

Q y 199 QCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRL 258 

|: :|: |:| llllllll III :| :::|llll:|ll lh :l h I 
Db 180 HVADQMI GS YNFGGI S S GERRRVS I AAQLLQDP KVMMLDEPTTGLDCMTANQI VLLLAEL 239 

Qy 259 AKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPA 318 

I: :|:|::::IMII::|: II : ::| I :: I ::|: : I : hill HM 
Db 240 ARRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPF 299 

Q y 319 DFWDLTSIDRRSKEREVATVEKAQSI^^FLEKVQGFDDFLWKAEAKELNTSTHTVSLT 37 8 

I I I : I I I I : I : I : I I I : I : : I I I I I I — • : : I 

Db 300 DFYMDLTSVDTQSREREIETYKRVQMLECAFKE SDIYHKI-LENIERARYLKTLP 353 

Qy 379 L TQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGF 432 

: |:| III : |:|l I |: ::: : :| I : I 

Db 354 MVPFKTKDP PGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLI FYL 405 



Qy 



433 LYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAGPYF 492 



I : : : | | | : : I : : I : I : I :: I : I I I I 

Db 406 LRVQNNTLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQML 465 

Qy 493 FAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELF LL--HFLLVWLWFCCRTM 544 

| : I I I : I :: II II I II I : : I I : 

Db 466 LAYVLHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFL TL 519 

Qy 545 ALAASAMLPTFHMS S FFCNALYNS FYLTAGFMINLDNLWIVPAWI SKLS FLRWCFSGLMQ 604 

| | : : I : : : I I : I : : I : : I : : I I : 

Db 520 VLLGIVQNPNI-VNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILW 578 

Qy 605 IQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISY 651 

: I I III : I : : I I : I I I : : 
Db 579 NEFYGL NFTCGGSNTSML NHPMCA 1 T Q GVQ F I EKT C PGAT S RFT 622 

Qy 652 -GFLFLY 657 

I I I I 

Db 623 ANFLILY 629 



RESULT 7 
US-09-837-992-3 

; Sequence 3, Application US/09837992 

; Patent No. US20020081687A1 

; GENERAL INFORMATION: 

; APPLICANT: Tian, Hui 

; APPLICANT: Schultz, Joshua 

; APPLICANT: Shan, Bei 

; APPLICANT: Tularik Inc. 

; TITLE OF INVENTION: Sitosterolemia Susceptibility Gene (SSG) : Compositions 

; TITLE OF INVENTION: and Methods of Use 

; FILE REFERENCE: 018781-006020US 

; CURRENT APPLICATION NUMBER: US/ 09/ 837 , 992 

; CURRENT FILING DATE: 2001-04-18 

; PRIOR APPLICATION NUMBER: US 60/198,465 

; PRIOR FILING DATE: 2000-04-18 

; PRIOR APPLICATION NUMBER: US 60/204,234 

; PRIOR FILING DATE: 2000-05-15 

; NUMBER OF SEQ ID NOS : 45 

SOFTWARE: Patentln Ver. 2.1 
; SEQ ID NO 3 

LENGTH: 651 

TYPE: PRT 
; ORGANISM: Homo sapiens 

FEATURE : 

; OTHER INFORMATION: human sitosterolemia susceptibility gene (SSG) 

; OTHER INFORMATION: amino acid sequence 

US-09-837-992-3 

Query Match 19.9%; Score 697; DB 9; Length 651; 

Best Local Similarity 29.1%; Pred. No. 2.3e-57; 

Matches 195; Conservative 129; Mismatches 263; Indels 84; Gaps 18 

17 LQDASGLQDSL FSSESDNSLYFTYSGQSNTLEVRDLTYQVDIASQVPWFEQLAQFK 72 

|| | | | | : : : I I : : I : I I II— : : 

15 LQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVR PWWD-ITSCR 61 



QY 
Db 



Qy 


73 


IPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKMKSGQ 

I ::::| 1 1 M : : 1 : 1 1 1 1 1 : : M 1 : : 1 1 1 1 h 
QQWTRQI LKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTF-LGE 


131 


Db 


62 


112 


Qy 


132 


IWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDV 

:::||: : : 1 : : 1 1 1 1 1 : 1 1 1 1 1 1 1 : 1 : : 1 : hill 
VYVNGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI-RRGNPGSFQKKVEAV 


191 


Db 


113 


171 


Qy 


192 


IAELRLRQCANTRVGNTWRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNL 

: I I 1 1 1 : : 1 1 : h 1 1 1 1 1 1 1 1 1 1 1 1 : 1 : : : 1 M 1 : 1 1 1 1 h : 
MAELSLSHVADRLI GNYSLGGI STGERRRVS I AAQLLQDPKVMLFDEPTTGLDCMTANQI 


251 


Db 


172 


231 


Qy 


252 


VTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPC 

I I | h 1 h h : : : 1 1 1 1 h : h 1 1 1 : : : : 1 hi : h : 1 h II 
WLLVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPC 


311 


Db 


232 


291 


Qy 


312 


PRYSNPADFWDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTS 

1 : 1 M 1 1 h 1 1 1 h 1 : 1 1 1 1 h 1 : : 1 : : : ' : 1 : 
PEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSA ICHKTLKNIERM 


371 


Db 


292 


345 


Qy 


372 


THTVSLTL TQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMS 


427 


Db 


346 


1 : 1 : h h II:: h 1 1 1 h : : = s I 
KHLKTLPMVPFKTKDS PGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMG 


397 


Qy 


428 


LI I GFLYYGHGAKQL — S FMDTAALLFMI GALI P FNVI LDWSKCHS ERSMLYYELEDGL 
| : | : | : | ||: h :|: |: \i: 1 :||| 
LFLLFFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGL 


485 


Db 


398 


457 


Qy 


486 


YTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELF LL — HFLLVWLV 

I 1 1 1 1 : h: II M 1 | | | : : 1 
YQKWQMMLAYALHVLP FS WATMI FSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFL- 


537 


Db 


458 


516 


Qy 


538 


VFCCRTMALAASAMLPTFHMS SFFCNALYNS FYLTAGFMINLDNLWIVPAWI SKLS FLRW 


597 


Db 


517 


|:| 1 ::| : :| |: |: : | || :| :: 
TLVLLGIVQNPNI VNSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFTFQKY 


570 


Qy 


598 


CFSGLMQIQFNGHLYTTQIGNFTFSILGDTM ISAMDLNSHPLY 

i i • • i I • 1 I ■ 1**1 1 : 1 II 
CSEILWNEFYGLNFT — CGSSNVSVTTNPMCAFTQGIQFIEKTCPGATSRFTMNFLILY 


640 


Db 


571 


628 


Qy 


641 


AIY — LIVIGI 649 

: |:::|| 
SFIPALVILGI 639 




Db 


629 





RESULT 8 

US-09-989-981A-6 

; Sequence 6, Application US/09989981A 

; Publication No. US20030049730A1 

; GENERAL INFORMATION: 

; APPLICANT: Hobbs, Helen H. 

; APPLICANT: Shan, Bei 

; APPLICANT: Barnes, Robert 

; APPLICANT: Tian, Hui 

; APPLICANT: Tularik Inc. 

; APPLICANT: Board of Regents, The University of Texas System 

; TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 



; FILE REFERENCE: 018781-007320US 

; CURRENT APPLICATION NUMBER: US/09/989, 981A 

; CURRENT FILING DATE: 2002-07-23 

; PRIOR APPLICATION NUMBER: US 60/252,235 

; PRIOR FILING DATE: 2000-11-20 

; PRIOR APPLICATION NUMBER: US 60/253,645 

; PRIOR FILING DATE: 2000-11-28 

; NUMBER OF SEQ ID NOS : 13 

; SOFTWARE: Patentln Ver. 2.1 

; SEQ ID NO 6 

LENGTH: 651 
; TYPE: PRT 

; ORGANISM: Homo sapiens 
FEATURE : 

OTHER INFORMATION: human ABCG5 (hABCG5 ) 
US-09-989-981A-6 



Query Match 19.9%; Score 697; DB 10; Length 651; 

Best Local Similarity 29.1%; Pred. No. 2.3e-57; 

Matches 195; Conservative 129; Mismatches 263; Indels 84; Gaps 18; 

Qy 17 LQDASGLQDSL FSSESDNSLYFTYSGQSNTLEVRDLTYQVDIASQVPWFEQLAQFK 72 

II I I I I :: :| I :: I : I I I |:: : : 

Db 15 LQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVR PWWD-ITSCR 61 

Qy 73 IPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKMKSGQ 131 

| : : : : I I I I I : : I : I I I I I : : I I I : : I I I I I = 

Db 62 QQWTRQI LKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTF-LGE 112 

Qy 132 IWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDV 191 

: : : I I : : : I : : I I I I I : I I I I I I I : I : : I : I : I I I 

Db 113 WWGRALRREQFQDCFSWLQSDTLLSSLTVRETLHYTTUjLAI-RRGNPGSFQKKV^V 171 

Qy 192 IAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNL 251 

:|||| I : : I I : I : I I I I I I I I I I I I : I : : : I I I I : I I I I I : : 
Db 172 MAELSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQI 231 

Qy 252 VTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPC 311 

| | ||: l|:|::::IMII::|:lll : ::: I |: I :|: :| 1:11 
Db 232 VVLLVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPC 291 

Qy 312 PRYSNPADFWDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAE1AKELNTS 371 

I : I I I I I I : I I I I : I : I I I I I : I I : : : : : I : 

Db 292 PEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSA ICHKTLKNIERM 345 

Qy 372 THTVSLTL TQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMS 427 

I : I : |:|: lh : I : M I I : :: : :| 

Db 346 KHLKTLPMVPFKTKDS PGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMG 397 

Qy 428 LI I GFLYYGHGAKQL — SFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYYELEDGL 485 

| : | : | : I I I : I : : I : I : I :: I : I I I 

Db 398 L FL LF FVL RVRS NVL KGAI Q DRVGL L YQ FVGAT P YT GMLNAVN L FP VL RAVS DQ E S Q DGL 457 

Qy 486 YTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELF LL — HFLLVWLV 537 

I I I II :h: II II I | | I : :| 

Db 458 YQKWQMMLAYALHVLPFSVVATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFL- 516 



Qy 538 VFCCRTMALAASAMLPT FHMS S FFCNALYN S FYLTAGFMINLDNLWI VPAWI S KLS FLRW 597 

|:| I ::| : : | | : | : : I II :| :: 

Db 517 TLVLLGIVQNPNI-VNSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFTFQKY 570 

Qy 598 CFSGLMQIQFNGHLYTTQIGNFTFSILGDTM ISAMDLNSHPLY 640 

I | : : | | : I I : I : : I I : I II 

Db 571 CSEI LWNEFYGLNFT — CGS SNVSVTTNPMCAFTQGIQFI EKTCPGATS RFTMNFLI LY 628 

Qy 641 AIY — LIVTGI 649 

: I : : : I I 

Db 629 SFIPALVILGI 639 



RESULT 9 
US-10-090-455-6 

; Sequence 6, Application US/10090455 

; Publication No. US20030027259A1 

; GENERAL INFORMATION: 

; APPLICANT: Chen, Hongyun 

; APPLICANT: Le Bihan, Stephane 

; TITLE OF INVENTION: NOVEL ABCG4 TRANSPORTER AND USES THEREOF 
; FILE REFERENCE: 100103.406 

; CURRENT APPLICATION NUMBER: US/10/ 090, 455 
; CURRENT FILING DATE: 2002-03-01 
; NUMBER OF SEQ ID NOS : 17 

; SOFTWARE: FastSEQ for Windows Version 4.0 
; SEQ ID NO 6 

LENGTH: 651 

TYPE: PRT 
; ORGANISM: Homo sapiens 
US-10-090-455-6 



Query Match 19.9%; Score 697; DB 14; Length 651; 

Best Local Similarity 29.1%; Pred. No. 2.3e-57; 

Matches 195; Conservative 129; Mismatches 263; Indels 84; Gaps 18; 



Qy 


17 


LQDASGLQDSL FSSESDNSLYFTYSGQSNTLEVRDLTYQVDIASQVPWFEQLAQFK 

II 1 1 1 1 : : : 1 1 : : 1 : 1 1 1 1 : : ': : 
LQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVR PWWD-ITSCR 


72 


Db 


15 


61 


Qy 


73 


IPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKMKSGQ 
I : : : : 1 1 1 1 1 : : 1 : 1 1 1 1 1 : : 1 1 1 : : 1 1 1 1 1 : 


131 


Db 


62 


QQWTRQI LKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTF-LGE 


112 


Qy 


132 


IWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDV 

: : : I 1 : : : 1 : : 1 1 1 II : 1 1 1 1 1 1 1 : 1 : : 1 = hill 
VYWGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYT7U.IAI-RRGNPGSFQKKVEAV 


191 


Db 


113 


171 


Qy 


192 


IAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNL 

: 1 1 1 1 h : 1 1 : h 1 1 1 1 1 1 1 1 1 1 II : 1 : : : 1 1 1 h 1 1 1 1 h : 
MAELSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMT.ANQI 


251 


Db 


172 


231 


Qy 


252 


VTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGT^iQQMVQYFTSIGHPC 

I | | | : | | : | : : : : | | I I |: : h 1 1 1 : : : : 1 hi : h : 1 h 1 1 

WLLVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPC 


311 


Db 


232 


291 


Qy 


312 


PRYSNPADFWDLTSIDRRSKEREVATV^KAQSLAALFLEKVQGFDDFLWKAEAKELNTS 


371 



I : I I I I I I : I I I I : I : I I I I I : I : : I : : : : : I : 

Db 292 PEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSA ICHKTLKNIERM 345 

Qy 372 THTVSLTL TQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMS 427 

I : I : I : I : I I : : I : I I I I : : : : : I 

Db 346 KHLKTLPMVPFKTKDS PGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMG 397 

Qy 428 LI I GFL YYGHGAKQL — S FMDTAALL FMI GALI PFNVI LDWS KCHS ERSMLY YELEDGL 485 

I : I : | : I I I : I : : I : I : I :: I : I I I 

Db 398 L FL L F FVL RVRS N VL KGAI Q DRVGLL YQ FVGAT P YT GMLN AVN L F P VL RAVS DQ E S Q D GL 457 

Qy 486 YTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELF LL — HFLLVWLV 537 

I I I I I : I : : II II I I I I : : I 

Db 458 YQKWQMMLAYALHVLPFS WATMI FS SVCYWTLGLHPEVARFGYFSAALLAPHLI GEFL- 516 

Qy 538 VFCCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRW 597 

I : I I : : I : : I I : I : : I I I : I : : 

Db 517 TLVLLGIVQNPNI-VNSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFTFQKY 570 

Qy 598 C FS GLMQ I Q FNGHL YTTQI GN FT FS I LGDTM 1 SAMDLNSHPLY 640 

I | : : | I : I I : I : : I I : I II 

Db 571 CSEILWNEFYGLNFT — CGSSNVSVTTNPMCAFTQGIQFIEKTCPGATSRFTMNFLILY 628 

Qy 641 AIY — LIVIGI 649 

: |:::|| 
Db 629 SFIPALVILGI 639 



RESULT 10 
US-09-866-866A-14 

; Sequence 14, Application US/09866866A 

; Patent No. US20020102244A1 

; GENERAL INFORMATION: 

; APPLICANT: Sorrentino, Brian 

; APPLICANT: Schuetz, John 

TITLE OF INVENTION: A Method of Identifying and/or Isolating Stem Cells 
; FILE REFERENCE: 1340-1-021CIP2 
; CURRENT APPLICATION NUMBER: US/09/866, 866A 
; CURRENT FILING DATE: 2001-08-30 
; PRIOR APPLICATION NUMBER: 09/584,586 
; PRIOR FILING DATE: 2000-05-31 
; PRIOR APPLICATION NUMBER: PCT/US99/11825 
; PRIOR FILING DATE: 1999-05-27 
; PRIOR APPLICATION NUMBER: 60/086,988 
; PRIOR FILING DATE: 1998-05-28 
; NUMBER OF SEQ ID NOS : 27 
; SOFTWARE: Patentln version 3.0 
; SEQ ID NO 14 
; LENGTH: 657 
; TYPE: PRT 

; ORGANISM: Mus mus cuius 
US-09-866-866A-14 

Query Match 19.2%; Score 672.5; DB 9; Length 657; 

Best Local Similarity 21.2%; Pred. No. 5.2e-55; 

Matches 176; Conservative 136; Mismatches 241; Indels 93; Gaps 16; 



Qy 91 LSF KVRSGQML AIIGSSGCGRASLLDVITGRG 122 

Ml 11:11 :: I I : I : I I : : I I I I I : I 

Db 37 LSFHHITYRVKVKSGFLVRKTVEKEILSDINGIMKPGLNAILGPTGGGKSSLLDVLAAR- 95 

Qy 123 HGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQA 182 

I II : I I I I I : I :| I I :: I I I I I I I I : I I I I 
Db 96 KDPKGLSGDVLINGAPQ-PAHFKCCSGYWQDDWMGTLTVRENLQFSAALRLPTTMKNH 154 

Qy 183 QRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSG 242 

::::!: : I II I : I : : : I I : : I I : I I I I I : I I I I : : I : : I II I I I I I : I 
Db 155 EKNERINTIIKELGLEKVADSKVGTQFIRGISGGERKRTSIGMELITDPSILFLDEPTTG 214 

Qy 243 LDSFTAHNLWTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQ 302 

Mill::: | | : : | I : : I : I I II I I : I I I : I : II : : I I I : : : 
Db 215 LDSSTANAVLLLLKRMSKQGRTIIFSIHQPRYSIFKLFDSLTLLASGKLVFHGPAQK7^LE 274 

Qy 303 YFTSIGHPCPRYSNPADFYVDLTS IDRRSKEREVATVEKAQSLAALFLEKVQG 355 

I I I I : I I : I I I I I : : I : : : M : : I I : : I : 

Db 275 YFASAGYHCEPYNNPADFFLDVINGDSSAVMLNREEQDNEANKTEEPSKGEKPVIENLSE 334 

Qy 356 F — DDFLWKAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDL 413 

I : :: IM I I • :| : I : I :| |::| 

Db 335 FYINSAIYGETKAELD QLPGAQEKKGTSAFKEPVYVTSFCHQLRWIARRSFKNL 388 

Qy 414 PTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTA7VLLFMIGALIPFNVILD 465 

|:: :: MM :M : I :|| : 
Db 389 LGNPQASVAQLIV TVILGLIIGAIYFDLKYDAAGMQNRAGVLFFL 433 

Qy 4 66 WSKCHS ERSMLYYELEDGLYTAGPYFFAKILGE-LPEHCAYVIIYAMPI 514 

: : I I I : : : I II I I I I : : : I I : I : : 

Db 434 TTNQCFSSVSAVELFWEKKLFIHEYISGYYRVSSYFFGKVMSDLLPMRFLPSVI FTCIL 493 

Qy 515 YWLTNLRPVPELFLLHFLLWLWFCCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAG 574 

I : : I : : I : : : I : Mill: : : : : I M 

Db 4 94 YFMLGLKKTVDAFFIMMFTLIMVAYTASSMALAIATGQSWSVATL^^ 553 

Qy 575 FMINLDNL — WIVPAWISKLSFLRWCFSGLMQIQFNGHLY TTQIGNFTFSI 623 

: : I I : I : : I : I I : I : I Ml : : | : : : 

Db 554 LLVNLRTI GPWL — SWLQYFS I PRYGFTALQYNEFLGQEFCPGFNVTDNSTCVNS YAICT 611 

Qy 624 LGDTMIS-AMDLNSHPLYAIYLIVTGISYGFLFLYYLSLKLIKQKS 668 

: M : : : I : I : : : : : I I : II I M : I 
Db 612 GNEYLINQGIELSPWGLWKNHVALACMI I I FLTIAYLKLLFLKKYS 657 



RESULT 11 
US-09-981-353-35 

Sequence 35, Application US/09981353 
Patent No. US20020160382A1 
GENERAL INFORMATION : 
APPLICANT: Lasek, Amy W. 
APPLICANT: Jones, David A. 

TITLE OF INVENTION: GENES EXPRESSED IN COLON CANCER 
FILE REFERENCE: PA- 003 8 US 

CURRENT APPLICATION NUMBER: US/09/981,353 
CURRENT FILING DATE: 2001-10-11 
NUMBER OF SEQ ID NOS : 194 



; SOFTWARE: PERL Program 
; SEQ ID NO 35 

LENGTH: 655 
; TYPE: PRT 

; ORGANISM: Homo sapiens 
FEATURE : 

NAME/ KEY: mis cofeature 

OTHER INFORMATION: Incyte ID No. US20020160382A1 5517972CD1 
US-09-981-353-35 

Query Match 18.9%; Score 659.5; DB 9; Length 655; 

Best Local Similarity 27.2%; Pred. No. 9e-54; 

Matches 185; Conservative 141; Mismatches 270; Indels 85; Gaps 21; 

Qy 28 FSSESDNSL-YFTYSGQSNTLEVRDLTYQVDIASQVPWFEQLAQFKIPWRSHSSQDSCEL 86 

I : : I I I I : I :: I : I : I : I I :: 
Db 2 0 F PAT AS N D L KAFT EGAVLSFHNICYRVKLKSGF LPCRKPVEKEI 63 

Qy 87 GI RNLS FKVRS GQMLAI I GS S GCGRAS LLDVI TGRGHGGKMKS GQI WINGQPST PQLVRK 146 

: I : : : : I : I I : I : I I : : I I I I I : I : I I : I I I I I 

Db 64 -LSNINGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSGL-SGDVLINGAPRPANF— K 118 

Qy 147 C-VAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRV 205 

I : I I I : : Mill I I I :|l I : ::::|: Mill: |:::l 
Db 119 CNSGYWQDDVVMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELGLDKVADSKV 178 

Qy 206 GNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLV 265 

I : : || I I I I I I : I I II : M : : I I I I II I I : I I M I I : : : I MM I : 
Db 17 9 GTQFIRGVSGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTI 238 



Qy 266 LISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLT 325 

: MUM 11 = 111 : I: II :: I II: : II I M I MIMIMM: 
Db 239 I FSIHQPRYSI FKLFDSLTLIASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDI I 298 

Qy 32 6 SIDRRS KEREVATVEK AQSLAALFLEKVQGFDDFL — WKAEAKELN 369 

: I : ||: I : I I : : : I III M : 

Db 299 NGDSTAVALNREEDFKATEIIEPSKQDKPLIEKLAEIYVN SSFYKETKAELHQLS 353 

Qy 37 0 TSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLI 42 9 

: | : : : | : I : : I I : : : : M 

Db 354 GGEKKKKI TVFKEI S YTT S FCHQLRWVSKRSFKNLLGNPQASIAQIIVTWLGLV 408 

Qy 430 I GFLYYGHGAKQLS FMDTAALLFMI GALI P FNVILDWS KCHS ERSMLYY 479 

II MM : | :|| : :M I M : : 

Db 409 IGAIYFGLKNDSTGIQNRAGVLFFL TTNQCFSSVSAVELFWEKKLFIH 457 

Qy 480 ELEDGLYTAGPYFFAKILGE-LPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLW 538 

I || II I : I : I I II: M : : I M : I : : M 

Db 458 EYISGYYRVSSYFLGKLLSDLLPMRMLPSIIFTCIWFMLGLKPKADAFFVMMFTLMMVA 517 



Qy 539 FCCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNL — WIVPAWISKLSFLR 596 

: Mill M ::: : : : I : : I I : I : M : II 

Db 518 YSAS SMALAI AAGQS WS VATLLMT I CFVFMMI FS GLLVNLTT I ASWL — SWLQYFS I PR 575 

Qy 597 WCFSGLMQIQFNGHLYTTQIG NFTFSILGDTMI — SAMDLNSHPLYAIYLIVI 647 

: I : I Ml: : | : : | : : MM I : : : : 

Db 576 YGFTALQHNEFLGQNFCPGLNATGNNPCNYA-TCTGEEYLVKQGIDLSPWGLWKNHVALA 634 



Qy 



648 GISYGFLFLYYLSLKLIKQKS 668 



Db 



635 CMIVI FLTIAYLKLLFLKKYS 655 



RESULT 12 
US-10-120-687-61 

; Sequence 61, Application US/10120687 
; Publication No. US20030082155A1 
; GENERAL INFORMATION: 

; APPLICANT: Massachusetts General Hospital 

; TITLE OF INVENTION: Stem Cells of the Islets of Langerhans and Their Use in 
Treating Diabetes 

; TITLE OF INVENTION: Mellitus 
; FILE REFERENCE: 3284/1235B 

; CURRENT APPLICATION NUMBER: US/10/120, 687 

; CURRENT FILING DATE: 2002-04-11 

; PRIOR APPLICATION NUMBER: US60/169082 

; PRIOR FILING DATE: 1999-12-06 

; PRIOR APPLICATION NUMBER: US 09/963,875 

; PRIOR FILING DATE: 2001-09-25 

; PRIOR APPLICATION NUMBER: US 60/215109 

; PRIOR FILING DATE: 2000-06-28 

; PRIOR APPLICATION NUMBER: US 60/238880 

; PRIOR FILING DATE: 2000-10-06 

; PRIOR APPLICATION NUMBER: US 09/731261 

; PRIOR FILING DATE: 2000-12-06 

; NUMBER OF SEQ ID NOS : 61 

; SOFTWARE: Patentln version 3.1 

; SEQ ID NO 61 

LENGTH: 655 

TYPE: PRT 
; ORGANISM: Homo sapiens 
US-10-120-687-61 

Query Match 18.9%; Score 659.5; DB 14; Length 655; 

Best Local Similarity 27.2%; Pred. No. 9e-54; 

Matches 185; Conservative 141; Mismatches 270; Indels 85; Gaps 21; 
Qy 28 FSSESDNSL-YFTYSGQSNTLEVRDLTYQVDIASQVPWFEQLAQFKIPWRSHSSQDSCEL 86 



Db 



i • • j i ii • i . . i . i - i 

20 F PAT AS N D L KAFT EGAVLSFHNICYRVKLKSGF- 



LPCRKPVEKEI 63 



QY 



87 GIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRK 14 6 



Db 



; l . . ; ; | . | | . | . | | . . | | | i | . | -11*1111 i 

64 -LSNINGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSGL-SGDVLINGAPRPANF — K 118 



Qy 



147 C-VAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRV 205 



Db 



i . i i i • • i i i i i i i i • i i i i • ii ii i * i • • • i 

119 CNSGYWQDDVVMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELGLDKVADSKV 17 8 



Qy 



206 GNT YVRGVS GG E RRRVS I GVQ L LWN PGILILDEPTSGLDS FTAHN LVT T LS RLAKGNRLV 265 



Db 



I •• i i i i i ii i • i i ) i • • i * * i ii i i i i i • • • i i • * i i 

179 GTQFI RGVSGGERKRTS I GMELITDPS I LFLDEPTTGLDS STANAVLLLLKRMSKQGRTI 238 



Qy 



266 LISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLT 325 



: I : M I I 11:111:1:11 : : I I I : : I I I I : I I : I I I I I : : I : 
Db 239 IFSIHQPRYSIFKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDII 298 

Qy 32 6 SIDRRS KEREVATVEK AQSLAALFLEKVQGFDDFL — WKAEAKELN 369 

: | : ||: I : I I : : : I I I I : I : 

Db 299 NGDSTAVALNREEDFKATEI I EPSKQDKPLI EKLAEI YVN SSFYKETKAELHQLS 353 

Qy 370 TSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLI 429 

:|: :: |: I : :| I : : :: h 

Db 354 GGEKKKKITVFKEISYTTS FCHQLRWVSKRS FKNLLGNPQAS IAQI I VTWLGLV 408 

Qy 430 I GFLYYGHGAKQLS FMDTAALLFMI GALI P FNVI LDWS KCHS ERSMLYY 479 

II :|:| : | :|| : ::| I |: : : 

Db 409 IGAIYFGLKNDSTGIQNRAGVLFFL TTNQCFS SVSAVELFWEKKLFIH 457 

Q y 480 ELEDGLYTAGPYFFAKILGE-LPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLW 538 

| || || |:| : || II: :|:: |:| : I : : :| 

D b 458 EYISGYYRVSSYFLGKLLSDLLPMRMLPSIIFTCIVYFMLGLKPKADAFFVMMFTLMMVA 517 

Qy 539 FCCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNL — WIVPAWISKLSFLR 596 

: :|||| :| ::: : : :| ::|| : I: :|: I I 

Db 518 YSASSMALAIAAGQSVVSVATLLMTICFVFMMIFSGLLVNLTTIASWL — SWLQYFSIPR 575 

Qy 597 WCFSGLMQIQFNGHLYTTQIG NFT FS I LGDTMI — S AMDLNSHPL YAI YLI VI 647 

: | : | : I I : : | : : | : : : I I : I : : : : 

Db 57 6 YGFTALQHNEFLGQNFCPGLNATGNNPCNYA-TCTGEEYLVKQGIDLSPWGLWKNHVALA 634 

Qy 648 GISYGFLFLYYLSLKLIKQKS 668 

: I I : I I I : I : I 

Db 635 CMIVIFLTIAYLKLLFLKKYS 655 



RESULT 13 
US-10-405-806-2 

; Sequence 2, Application US/10405806 

; Publication No. US20030232362A1 

; GENERAL INFORMATION: 

; APPLICANT: KOMATANI, HIDEYA 

; APPLICANT: HARA, YOSHIKAZU 

; APPLICANT: KOTANI, HIDEHITO 

; APPLICANT: NAKAGAWA, RINAKO 

; TITLE OF INVENTION: DRUG RESISTANT GENE AND USE THEREOF 

; FILE REFERENCE: 234985US0CONT 

; CURRENT APPLICATION NUMBER: US/10/ 4 05 , 806 

; CURRENT FILING DATE: 2003-04-03 

; PRIOR APPLICATION NUMBER: PCT/ JP01/ 08 112 

; PRIOR FILING DATE: 2001-09-18 

; PRIOR APPLICATION NUMBER: JP2000-303441 

; PRIOR FILING DATE: 2000-10-03 

; NUMBER OF SEQ ID NOS : 17 

; SOFTWARE: Patentln version 3.2 

; SEQ ID NO 2 

LENGTH: 655 

TYPE: PRT 
; ORGANISM: Homo sapiens 
US-10-405-806-2 



Query Match 18.9%; Score 659.5; DB 15; Length 655; 

Best Local Similarity 27.2%; Pred. No. 9e-54; 

Matches 185; Conservative 141; Mismatches 270; Indels 85; Gaps 21;, 

Q y 28 FSSESDNSL-YFTYSGQSNTLEVRDLTYQVDIASQVPWFEQLAQFKIPWRSHSSQDSCEL 86 

| : : I I I I : I : : I : I : I : I I - 

Db 20 F PAT AS N D L KAFT EGAVLS FHNI CYRVKLKSGF LPCRKPVEKEI 63 

Qy 87 GIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRK 146 

: | : : : : I : I I : I : I I : : I I I I I : I : I I : I I I 1 I 

Db 64 -LSNINGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSGL-SGDVLINGAPRPANF--K 118 

Qy 147 C-VAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRV 205 

| : | | | : : | I I I I I I I : I I I : : : : : I : | | | | I : I : : : I 
D b 119 CNSGYWQDDVVMGTLTVRENLQFSA7U.RLATTMTNHEKNERINRVIQELGLDKVADSKV 178 

Qy 206 GNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLV 265 

| ::||||||||:| ll|::|: : I I I I I I I I : I I I I I I : : : I l"l I : 
D b 179 GTQFIRGVSGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTI 238 

Q y 266 LISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLT 325 

: | : | I I I I I : I I I : I : I I : : I I I : : I I I I : I I : I I I I I : : I : 
Db 239 IFSIHQPRYSIFKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDII 298 

Qy 326 SIDRRS KEREVATVEK AQSLAALFLEKVQGFDDFL — WKAEAKELN 369 

: | : ||: I : II ::: I I I I : I 

D b 299 NGDSTAVALNREEDFKATEIIEPSKQDKPLIEKLAEIYVN SSFYKETKAELHQLS 353 

Qy 370 TSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLI 429 

: | : : : I : I : : I I : : : : I : 

Db 354 GGEKKKKITVFKEISYTTS FCHQLRWVSKRS FKNLLGNP.QAS I AQI I VTWLGLV 408 

Q y 430 I GFLYYGHGAKQLS FMDTAALLFMI GALI PFNVT LDWSKCHS ERSMLYY 479 

|| :|:| : I :ll : ::l I 1= : * 

Db 409 IGAIYFGLKNDSTGIQNRAGVLFFL TTNQCFSSVSAVELFWEKKLFIH 457 

Qy 480 ELEDGLYTAGPYFFAKILGE-LPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLW 538 

| || II I : I : M II: : I hi : I : : : I 

Db 458 EYI SGYYRVS S YFLGKLLSDLLPMRMLPS 1 1 FTCI VYFMLGLKPKADAFFVMMFTLMMVA 517 

Qy 539 FCCRTMALAAS AMLPT FHMS S FFCNAL YNS FYLTAGFMINLDNL — WI VPAWI SKLS FLR 596 

: : | | I I : I : : : : : : I : : I I : I : : I : II 

Db 518 YSASSMALAIAAGQSWSVATLLMTI CFVFMMI FSGLLVNLTTI ASWL — SWLQYFS I PR 575 

Qy 597 WCFSGLMQIQFNGHLYTTQIG— NFTFSILGDTMI — SAMDLNSHPLYAI YLIVI 647 

: |: I :| I : : |: : |: : :||: |: :: : 

Db 57 6 YGFTALQHNEFLGQNFCPGLNATGNNPCNYA-TCTGEEYLVKQGIDLSPWGLWKNHVALA 634 

Qy 648 GISYGFLFLYYLSLKLIKQKS 668 

: I I : I I I : I : I 
Db 635 CMIVI FLTIAYLKLLFLKKYS 655 



RESULT 14 
US-09-961-086-1 

; Sequence 1, Application US/09961086 
; Publication No. US20030036645A1 



GENERAL INFORMATION: 
APPLICANT: UNIVERSITY OF MARYLAND, BALTIMORE 
APPLICANT: ROSS, Douglas D. 
APPLICANT: DOYLE, L. Austin 
APPLICANT: ABRUZZO, Lynne 

TITLE OF INVENTION: BREAST CANCER RESISTANCE PROTEIN (BCRP) AND THE DNA 
TITLE OF INVENTION: WHICH ENCODES IT 
FILE REFERENCE: EP19376-019 
CURRENT APPLICATION NUMBER: US/09/961,086 
CURRENT FILING DATE: 2001-09-21 
PRIOR APPLICATION NUMBER: US 60/073,763 
PRIOR FILING DATE: 1998-02-05 
PRIOR APPLICATION NUMBER: PCT/US99/ 02577 
PRIOR FILING DATE: 1999-02-05 
NUMBER OF SEQ ID NOS : 7 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 1 
LENGTH: 655 
TYPE: PRT 

ORGANISM: Homo sapiens 
US-09-961-086-1 

Query Match 18.8%; Score 657.5; DB 10; Length 655; 

Best Local Similarity 27.2%; Pred. No. 1.4e-53; 

Matches 185; Conservative 141; Mismatches 270; Indels 85; Gaps 21; 

Q y 28 FSSESDNSL-YFTYSGQSNTLEVRDLTYQVDIASQVPWFEQLAQFKIPWRSHSSQDSCEL 86 

I : : I I II : I :: |:| : I :l I 
Db 20 FP AT AS N D L KAFT EGAVLSFHNICYRVKLKSGF LPCRKPVEKEI 63 

Qy 87 GIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRK 146 

: | : : : : | : | | : | : | I : : I II I I : I : I I : I I I I I 

D b 64 -LSNINGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSGL-SGDVLINGAPRPANF K 118 

Qy 147 C-VAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVI7VELRLRQCANTRV 205 

| : | I I : : Mill I I I :|| I : ::::h Mill: l — l 
Db 119 CNSGYWQDDVVMGTLTWENLQFSAALRIATTMTNHEKNERINRVIQELGLDKVADSKV 178 

Qy 206 GNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLV 265 

| : : | | | | | I II : I I II : : I : M I I II I I I : I II I I I : : : I I : : I I = 
Db 179 GTQFIRGVSGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTI 238 

Qy 266 LISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLT 325 

: 1 = 1111 IMIII : I: II : : I I I : : I I I I : I |:IIMI::|: 
Db 239 I FSIHQPRYS I FKLFDSLTLLASGRLMFHGPAQEAiGYFESAGYHCEAYNNPADFFLDI I 298 

Qy 326 SIDRRS KEREVATVEK AQSLAALFLEKVQGFDDFL WKAEAKELN 369 

: I : II: I : I I : : : I Ml =M 

Db 299 NGDSTAVALNREEDFKATEIIEPSKQDKPLIEKLAEIYVN SSFYKETKAELHQLS 353 

Qy 370 TSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLI 429 

:|: :: |: | : :| I : : :: M 

Db 354 GGEKKKKITVFKEI SYTTS FCHQLRWSKRSFKNLLGNPQASIAQII VTWLGLV 408 

Qy 430 IGFLYYGHGAKQLS FMDTAALLFMI GALI P FNVI LDWSKCHS ERSMLYY 479 

I I : I : I : I : I I : : : I I I : : : 

Db 409 IGAIYFGLKNDSTGIQNRAGVLFFL TTNQCFSSVSAVELFWEKKLFIH 457 



Qy 480 ELEDGLYTAGPYFFAKILGE-LPEHCAWIIYAMPIYWLTNLRPVPELFLLHFLLWLW 538 

I II I I I : I : I I II: : I : : I : I : I : : : I 

Db 458 EYISGYYRVSSYFLGKLLSDLLPMTMLPSIIFTCIWFMLGLKPKADAFFVMMFTLMMVA 517 

Qy 539 FCCRTMAI^SAMLPTFHMSSFFCNALYNSFYLTAGmiNLDNL--WIVPAWISKLSFLR 596 

: : I I I I : I : : : : : : I : : I I : I : : I : I I 

Db 518 YSAS SMALAI AAGQS WS VATLLMT I CFVFMMI FS GLLVNLTT I ASWL — SWLQYFS I PR 575 

Qy 597 WCFSGLMQIQFNGHLYTTQIG NFT FS I LGDTMI — SAMDLNSHPLYAI YLI VI 647 

: I : I : I I : : I : : I : : : I I : I : :: : 

Db 576 YGFTALQHNEFLGQNFCPGLNATGNNPCNYA-TCTGEEYLVKQGIDLSPWGLWKNHVALA 634 

Qy 648 GISYGFLFLYYLSLKLIKQKS 668 

: I I : II I : I : I 
Db 635 CMI VI FLT I AYLKLLFLKKYS 655 



RESULT 15 
US-10-405-806-13 

; Sequence 13, Application US/10405806 

; Publication No. US20030232362A1 

; GENERAL INFORMATION: 

; APPLICANT: KOMATANI, HIDEYA 

; APPLICANT: HARA, YOSHIKAZU 

; APPLICANT : KOTANI, HIDEHITO 

; APPLICANT: NAKAGAWA, RINAKO 

; TITLE OF INVENTION: DRUG RESISTANT GENE AND USE THEREOF 

; FILE REFERENCE: 234985US0CONT 

; CURRENT APPLICATION NUMBER: US/10/405, 806 

; CURRENT FILING DATE: 2003-04-03 

; PRIOR APPLICATION NUMBER: PCT/JP01/ 08112 

; PRIOR FILING DATE: 2001-09-18 

; PRIOR APPLICATION NUMBER: JP2 000-303441 

; PRIOR FILING DATE: 2000-10-03 

; NUMBER OF SEQ ID NOS: 17 

; SOFTWARE: Patentln version 3,2 

; SEQ ID NO 13 

; LENGTH: 655 

TYPE: PRT 
; ORGANISM: Artificial Sequence 

FEATURE: 

OTHER INFORMATION: ABCG2 4 82Tmutant sequence 
US-10-405-806-13 

Query Match 18.8%; Score 657.5; DB 15; Length 655; 

Best Local Similarity 21.2%; Pred. No. 1.4e-53; 

Matches 185; Conservative 141; Mismatches 270; Indels 85; Gaps 21 

Qy 28 FSSESDNSL-YFTYSGQSNTLEVRDLTYQVDIASQVPWFEQLAQFKI PWRSHSSQDSCEL 8 6 

I : : I I I I : I : : I : I : I : I I : : 
Db 20 FPATASNDLKAFT EGAVLSFHNICYRVKLKSGF LPCRKPVEKEI 63 

Qy 87 GIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRK 14 6 

: I : : : : I : I I : I : I I : : I I I II : I : I I : I I I I I 

Db 64 -LSNINGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSGL-SGDVLINGAPRPANF— K 118 



Qy 147 C-VAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRV 205 

| : | | | : : Mill I I I :ll I : -"h II M I = I— I 
D b H9 CNSGYWQDDVWGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELGLDKVADSKV 178 

Qy 2 06 GNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLV 265 

| ::||||||||:| |||::|: : I I I I I I I I : I I I I I I : : : I hM I = 
Db 179 GTQFIRGVSGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTI 238 

Qy 266 LISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLT 325 

: |:|||| ||:||| : I: II : : I I I : : I I I I : I |:lllll::|: 
Db 239 I FS IHQPRYS I FKLFDS LTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDI I 298 



Qy 



326 SIDRRS KEREVATVEK AQSLAALFLEKVQGFDDFL — WKAEAKELN 369 

: | : ||: I : I I : : : I I I I : I : 

Db 299 NGDSTAVALNREEDFKATEIIEPSKQDKPLIEKLAEIYVN— S S FYKETKAELHQLS 353 

Qy 370 TSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLI 429 

:|: I: | : :| | : : :: |: 

Db 354 GGEKKKKITVFKEI S YTTS FCHQLRWVS KRS FKNLLGNPQAS I AQI I VTWLGLV 408 

Qy 430 I GFLYYGHGAKQLS FMDTAALLFMI GALI P FNVI LDWSKCHS ERSMLYY 479 

| I : I : I : | : | | : :: I I I : : : 

Db 4 0 9 iGAIYFGLKNDSTGIQNRAGVLFFL TTNQCFSSVSAVELFWEKKLFIH 457 

Qy 48 0 ELEDGLYTAGPYFFAKILGE-LPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLW 538 

| || II 1:1 : II II: :|:: |:| : I : : :| 

Db 458 EYISGYYRVSSYFLGKLLSDLLPMTMLPSIIFTCIVYFMLGLKPKADAFFVMMFTLMMVA 517 

Qy 539 FCCRTMALAASAMLPTFHMS SFFCNALYNS FYLTAGFMINLDNL — WIVPAWI SKLS FLR 596 

: : | | | 1 :| ::: : : : | : : I I : I : : I : II 

Db 518 YSAS SMALAIAAGQSVVSVATLLMTICFVFMMI FSGLLVNLTTIASWL SWLQYFSIPR 575 

Q y 597 WCFSGLMQIQFNGHLYTTQIG NFTFSILGDTMI — SAMDLNSHPLYAIYLIVI 647 

: | : | : I I : : | : : | : : : I I : | : : : : 

Db 57 6 YGFTALQHNEFLGQNFCPGLNATGNNPCNYA-TCTGEEYLVKQGIDLSPWGLWKNHVALA 634 

Qy 648 GISYGFLFLYYLSLKLIKQKS 668 

: II : I I I : I : I 
Db 635 CMI VI FLT I AYLKLLFLKKYS 655 



Search completed: February 27, 2004, 07:34:05 
Job time : 32.1994 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search, using sw model 

Run on: February 27, 2004, 06:40:43 ; Search time 37.3051 Seconds 

(without alignments) 
5683.620 Million cell updates/sec 



Title: 

Perfect score: 
Sequence : 



US-09-989-981A-4 
3494 

1 MAEKTKEETQLWNGTVLQDA FLFLYYLSLKLIKQKSIQDW 672 



Scoring table: BLOSUM62 

Gapop 10.0 , Gapext 0.5 

Searched: 1017041 seqs, 315518202 residues 

Total number of hits satisfying chosen parameters: 



1017041 



Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



Database : SPTREMBL 25:* 



1: 




sp archea:* 


2 




sp bacteria:* 


3 




sp fungi:* 


4 




sp human:* 


5 




sp invertebrate:* 


6 




sp mammal : * 


7 




sp mhc:* 


8 




sp organelle:* 


9 




sp_phage: * 


10 


sp plant:* 


11 


sp rodent:* 


12 


sp virus:* 


13 


sp vertebrate:* 


14 


sp unclassified:* 


15 


sp rvirus:* 


16 


sp bacteriap:* 


17 


sp archeap:* 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



SUMMARIES 

% 

Result Query 

No. Score Match Length DB ID Description 



1 


3494 


100 . 


o 


672 


11 


Q7TSR6 


Q7tsr6 mus musculu 


2 


3484 


99 . 


7 


672 


11 


Q7TSR7 


Q7tsr7 mus musculu 


3 


3478 . 5 


99 . 


6 


673 


11 


Q8R543 


Q8r543 mus musculu 


4 


3225 


92 . 


3 


672 


11 


Q8CIQ5 


Q8ciq5 rattus norv 


5 


782 


22 . 


4 


648 


10 


Q9C6W5 


Q9c6w5 arabidopsis 


5 


777 


22 . 


2 


646 


10 


09C6R7 


Q9c6r7 arabidopsis 


7 


756 


21 . 




668 


10 


Q9ARU4 


Q9aru4 oryza sativ 


8 


749.5 


21 . 


5 


725 


10 


Q9ZU35 


Q9zu35 arabidopsis 


Q 

ZS 


749.5 


21 . 


5 


725 


10 


09ASR9 


Q9asr9 arabidopsis 


x u 


7 1 f) 

/ X VJ 


9 0 

Z \J ■ 




672 


10 


^/ *J J_l ± U £, 


r)91"ift9 arflbi d on sis 


1 1 

X X 


7D9 S 


20 . 




652 


11 


07TSR8 


Q7tsr8 mus musculu 


12 


695 . 5 


19 . 


g 


687 


5 


Q9NH94 


Q9nh94 bombyx mori 


X o 


6QS 

U -7 -J 


1 9 

X _/ . 


q 

-7 


8 01 


5 


Q8T691 


08t691 dictvosteli 


14 


677 . 5 


19 . 


4 


657 


11 


O80W57 


Q80w57 rattus norv 




677 s 


1 9 
x y • 


4 


657 


11 


Oft 0ST1 


Q80stl rattus norv 


1 6 
X o 


67 ? S 


X _7 • 




657 


11 


07TMS 5 


07tms5 mus musculu 


1 7 

X / 


67 ^ S 


1 Q 

X -7 • 


3 


662 


10 


Q949Y4 


Q949y4 arabidopsis 


1 ft 

X o 


672 . 5 


19 . 


2 


657 


11 


O9R004 


Q9r004 mus musculu 


1 Q 

X J 


679 S 


19 . 


2 


662 


10 


084TH5 

y/ y~f i ilw 


Q84th5 arabidopsis 


Z u 


671 S 


1 9 

X J • 


9 
z 


6S7 


11 
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ALIGNMENTS 



RESULT 1 
Q7TSR6 

ID Q7TSR6 PRELIMINARY; PRT; 672 AA. 

AC Q7TSR6; 

DT 01-OCT-2003 (TrEMBLrel . 25, Created) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 



DE ATP-binding cassette sub- family G member 8. 

GN ABCG8 . 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=PERA/Ei; TISSUE=Liver ; 

RA Wittenburg H., Lyons M.A. , Li R. , Churchill G.A. , Carey M.C., 

RA Paigen B. ; 

RT "Primary Roles of FXR and ABCG5/ABCG8 in Cholesterol Gallstone 

RT Susceptibility: Evidence from a Cross of PERA/Ei and I/Ln Inbred 

RT Mice."; 

RL Submitted (DEC-2002) to the EMBL/ GenBank/DDBJ databases. 

DR EMBL; AY196216; AAO45096.1; -. 

KW ATP-binding. 

SQ SEQUENCE 672 AA; 75867 MW; CAB72 0502EA8FE21 CRC64; 

Query Match 100.0%; Score 3494; DB 11; Length 672; 

Best Local Similarity 100.0%; Pred. No. 1.5e-255; 

Matches 672; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIAS 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
Db 1 MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIAS 60 

Qy 61 QVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 120 

I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 
Db 61 QVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 120 

Qy 121 RGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFS 180 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I II I II M I I I I 
Db 121 RGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFS 180 

Qy 181 QAQ RDKRVED VI AE L RLRQ CANT RVGNT YVRGVS GG ERRRVS I GVQL LWN PGILILDEPT 240 

I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 QAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPT 24 0 

Qy 241 SGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQM 300 

I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 241 SGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQM 300 

Qy 301 VQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFL 360 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 VQYFTSIGHPCPRYSNPADFWDLTSIDRRSKEREVATV^KAQSLAALFLEKVQGFDDFL 360 

Qy 361 WKAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHG 420 

I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
Db 361 WKAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHG 420 

Qy 421 SEACLMSLIIGFLYYGHGAKQLSFMDTA7VLLFMIGALIPFNVILDWSKCHSERSMLYYE 480 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 
Db 421 SEACLMSLIIGFLYYGHGAKQLSFMDTA7VLLFMIGALIPFNVILDWSKCHSERSMLYYE 480 



Qy 



481 LEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLWFC 
I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 



540 



Db 



481 



LEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLWFC 54 0 



Qy 541 CRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFS 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I M 
Db 541 CRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFS 600 

Qy 601 GLMQIQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYLS 660 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 601 GLMQIQFNGHLYTTQI GNFTFS I LGDTMI SAMDLNSHPLYAI YLIVI GI S YGFLFLYYLS 660 

Qy 661 LKLIKQKSIQDW 672 

I I I I I I I I I I I I 
Db 661 LKLIKQKSIQDW 672 



RESULT 
Q7TSR7 
ID 
AC 
DT 
DT 
DT 
DE 



Created) 

Last sequence update) 
Last annotation update) 
member 8 . 



Craniata; Vertebrata; Euteleostomi ; 
Sciurognathi; Muridae; Murinae; Mus . 



Q7TSR7 PRELIMINARY; PRT; 672 AA. 

Q7TSR7; 

01-OCT-2003 (TrEMBLrel. 25, 
01-OCT-2003 (TrEMBLrel. 25, 
01-OCT-2003 (TrEMBLrel. 25, 
ATP-binding cassette sub-family G 
GN ABCG8 . 

OS Mus musculus (Mouse) . 
OC Eukaryota; Metazoa; Chordata; 
OC Mammalia; Eutheria; Rodentia; 
OX NCBI_TaxID=10090; 
RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-I/LnJ; TISSUE=Liver ; 

RA Wittenburg H., Lyons M.A., Li R. , Churchill G.A. , Carey M.C., 
RA Paigen B. ; 

RT "Primary Roles of FXR and ABCG5/ABCG8 in Cholesterol Gallstone 
RT Susceptibility: Evidence from a Cross of PERA/Ei and I/Ln Inbred 
RT Mice."; 

RL Submitted (DEC-2002) to the EMBL/GenBank/DDBJ databases. 
DR EMBL; AY196215; AAO45095.1; -. 
KW ATP-binding. 

SQ SEQUENCE 672 AA; 75805 MW; E5B30B58902 00A4 1 CRC64 ; 



Query Match 99,7%; Score 3484; DB 11 

Best Local Similarity 99.7%; Pred. No. 8.4e-255 
Matches 67 0; Conservative 0; Mismatches 2 



Length 672; 

Indels 0; Gaps 0; 



Qy 

Db 

Qy 
Db 

Qy 

Db 



1 MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIAS 

I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1 MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIAS 



61 



60 



60 



120 



QVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 
I I I I I I I I I I I I I I I I I I I I I I I I II I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
61 QVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 120 



121 



180 



RGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFS 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
121 RGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFS 180 



Qy 
Db 



181 QAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPT 240 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
181 QAQ RD KRVEDVI AE L RLRQ CANT RVGNT YVRGVS GGERRRVS I GVQ LLWN PGILILDEPT 240 



Qy 241 SGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQM 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 241 S GL D S FT AHN L VT T L S RLAKGN RL VL I S LHQ P RS D I FRL FD L VL LMT S GT P I Y L GAAQQM 300 

Qy 301 VQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFL 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 VQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFL 360 

Qy 361 WKAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHG 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 361 WKAEAKELNTSTHTVSLTLTQDTDCGTAAELPGMIEQFSTLIRRQISNDFRDLPTLLIHG 420 

Qy 421 S EACLMS LI I GFLYYGHGAKQLS FMDTAALLFMI GAL I P FNVI LDWS KCHS ERSMLYYE 480 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 421 S EACLMS LI I GFLYYGHGAKQLS FMDTAALLFMI GAL IP FNVI LDVVS KCHS ERSMLYYE 480 

Qy 481 LEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLWFC 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I II I I I I I I I I I II 

Db 481 LEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHLLLVWLWFC 540 

Qy 541 CRTMALAASAMLPTFHMSS FFCNALYNS FYLTAGFMINLDNLWIVPAWI SKLS FLRWCFS 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
Db 541 CRTMALAASAMLPTFHMS S FFCNALYNS FYLTAGFMINLDNLWIVPAWI SKLS FLRWCFS 600 

Qy 601 GLMQIQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYLS 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
Db 601 GLMQIQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYLS 660 

Qy 661 LKLIKQKSIQDW 672 

I I I I I I I I I I I I 
Db 661 LKLIKQKSIQDW 672 



RESULT 3 
Q8R543 



ID Q8R543 PRELIMINARY ; PRT; 673 AA. 

AC Q8R543; 

DT 01-JUN-2002 (TrEMBLrel. 21, Created) 

DT 01-JUN-2002 (TrEMBLrel. 21, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Sterolin 2. 

GN ABCG8 . 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=12 9/Sv; 

RA Lu K., Zhou Y. , Lee M.-H., Patel S.B.; 

RT "Molecular cloning, genomic structure and characterization of novel 

RT mouse head-to-head tandem ABC transporters."; 

RL Submitted (FEB-2001) to the EMBL/GenBank/DDBJ databases. 
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nn 
UK 


TPtJfD T • 


Ar oDloU4, 


AZiT QOQQQ 
/\/\XjO Z Ojo. 


-L r 


<J \ J X IN Hi u . 


DR 


EMBL; 


AF351805, 


■ AAL82898. 


1; 


0 VJ X IN Cj JJ . 


DR 


EMBL; 


AF351807, 


■ AAL82898. 


l; 


JOINED. 


nn 

UK 


EMBL; 


AF351808, 


\ AAL82898. 


l; 


JOINED. 


nn 
UK 


EMBL; 


AF351809, 


; AAL82898. 


l; 


JOINED. 


nn 
UK 


EMBL; 


AF351810, 


\ AAL82898. 


l; 


JOINED. 


DR 


GO; GO: 0016020, 


r C: membrane; 


IEA. 


DR 


GO; GO: 0005524, 


r F: ATP binding; IEA. 


DR 


GO; GO:0004009, 


r F: ATP-binding cassette (ABC) transporter acti 


DR 


GO; GO: 0006810, 


; P: transport; 


IEA. 


DR 


InterPro; IPR003439; ABC_ 


transporter. 


DR 


Pfam; 


PF00005; 


ABC tran; 


1. 





DR ProDom; PD000006; ABC_transporter ; 1. 

DR PROSITE; PS00211; ABC_TRANSPORTER_l; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

SQ SEQUENCE 673 AA; 76008 MW; FA08340445DF259C CRC64; 



Query Match 99.6%; Score 3478.5; DB 11; Length 673; 

Best Local Similarity 99.7%; Pred. No. 2.2e-254; 

Matches 671; Conservative 0; Mismatches 1; Indels 1; Gaps 1; 

Qy 1 MAEKTKEETQLWNGTVLQDAS-GLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIA 59 

I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I 
Db 1 MAEKTKEETQLWNGTVLQDASQGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIA 60 

Qy 60 SQVPWFEQI^QFKIPWRSHSSQDSCELGIRKLSFKVRSGQMLAIIGSSGCGRASLLDVIT 119 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 SQVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVIT 120 

Qy 120 GRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTF 179 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
Db 121 GRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTF 180 

Qy 180 S QAQ RD KRVE DVI AELRL RQCANT RVGNT YVRGVS GGE RRRVS I GVQLLWN PGILILDEP 239 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 S QAQ RDKRVE DVI AE LRL RQ CANT RVGNT YVRGVS GGE RRRVS I GVQL LWN PGILILDEP 240 

Qy 240 TSGLDSFTAHNLVTTLSRx^GNRLVLISLHQPRSDIFRLFDLVXL^SGTPIYLGAAQQ 299 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I 
Db 241 TSGLDSFTAHNLWTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQ 300 

Qy 300 MVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDF 359 

I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 MVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDF 360 

Qy 360 LWKAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIH 419 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 361 LWKAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIH 420 

Qy 420 GSEACLMSLI I GFLYYGHGAKQLS FMDTAALLFMI GALI PFNVI LDWS KCHSERSMLYY 479 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 



Db 



421 



GSEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYY 480 



Qy 480 ELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLWF 539 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I II I I I I I I I I I I I I I 
Db 481 ELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLWF 540 

Qy 54 0 CCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCF 599 

III I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 541 CCRNMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCF 600 

Qy 600 SGLMQIQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYL 659 

I I I I I I I I I I I I I I I I I I I I I I I I M I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
Db 601 SGLMQIQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYL 660 

Qy 660 SLKLIKQKSIQDW 672 

I I I I I I I I I I I I I 
Db 661 SLKLIKQKSIQDW 673 



AC 
DT 
DT 
DT 
DE 
GN 
OS 
OC 
OC 
OX 
RN 
RP 
RC 
RA 
RT 
RT 
RL 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
SQ 



Created) 

Last sequence update) 
Last annotation update) 



RESULT 4 
Q8CIQ5 

ID Q8CIQ5 PRELIMINARY; PRT; 672 AA. 

Q8CIQ5; 

01-MAR-2003 (TrEMBLrel. 23, 
01-MAR-2003 (TrEMBLrel. 23, 
01-OCT-2003 (TrEMBLrel. 25, 
Sterolin 2. 
ABCG8. 

Rattus norvegicus (Rat) . 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Rattus. 
NCBI_TaxID=10116; 
[1] 

SEQUENCE FROM N.A. 
STRAIN^Sprague-Dawley; 

Yu H-, Lu K. , Lee M. , Pandit B., Patel s.B.; 

"The rat Abcg5 and Abcg8 : characterization, chromosomal assignment and 
genetic variation in sitosterolemic rats."; 
Submitted (AUG-2002) to the EMBL/ GenBank/DDBJ databases. 
EMBL; AY145899; AAN64276.1; -. 
GO; GO: 0016020; C:membrane; IEA. 
GO; GO: 0005524; F: ATP binding; IEA. 
GO; GO: 0004009; F : ATP-binding cassette 
GO; GO: 0006810; P: transport; IEA. 
InterPro; IPR003439; ABC_transporter . 
Pfam; PF00005; ABC_tran; 1. 
ProDom; PD000006; ABC_transporter ; 1. 
PROSITE; PS00211; ABC_TRANSPORTER_l ; 1. 
PROSITE; PS50893; ABC TRANSPORTER 2; 1. 



(ABC) transporter acti . 



IEA. 



SEQUENCE 672 AA; 75906 MW; 2FE0846E71BD9D47 CRC64 ; 



Query Match 92.3%; Score 3225; DB 11; Length 672; 

Best Local Similarity 91.2%; Pred. No. 3.2e-235; 

Matches 613; Conservative 29; Mismatches 30; Indels 0; 



Gaps 



0; 



Qy 



MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIAS 
I I I I I I II I I I I I I I II I I I I II I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I 



60 



Db 



1 



MAEKTKEETQLWNGTVLQDASSLQDSVFSSESDNSLYFTYSGQSNTLEVRDLTYQVDMAS 60 



Qy 61 QVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 120 

I I I I I I I I I I I I : I I I I I I I I : I I I I 1 I I I I I I I I I I I I I I I I : I I I I I : I I I II I I 
Db 61 QVPWFEQLAQFKLPWRSRGSQDSWDLGIRNLSFKVRSGQMLAIIGSAGCGRATLLDVITG 120 

Qy 121 RGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFS 180 

I I I I I I I I I I I I I I I I I I I I I I :: I I I II I I I I I I I I I I I I I I I I 11111111:111 
Db 121 RDHGGKMKSGQIWINGQPSTPQLIQKCVAHVRQQDQLLPNLTVRETLTFIAQMRLPKTFS 180 

Qy 181 QAQ RD KRVE DVI AEL RL RQCANT RVGNT YVRGVS GGE RRRVS I GVQL LWN PGILILDEPT 240 

I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 QAQ RD KRVE DVI AEL RL RQCANT RVGNT YVRGVS GGE RRRVS I GVQL LWN PGILILDEPT 240 

Qy 241 SGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQM 300 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I II I 
Db 241 S GLDS FTAHNLVRTLS RLAKGNRLVLI S LHQPRS DI FRLFDLVLLMT S GTP I YLGVAQHM 300 

Qy 301 VQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFL 360 

I I I I II I I : I I I I I I I I I I I I I I I I I I I I I I I I : I I I I : I I I : I I I I I I I I I I I I I I I I 
Db 301 VQYFTSIGYPCPRYSNPADFYVDLTSIDRRSKEQEVATMEKARLLAALFLEKVQGFDDFL 360 

Qy 361 WKAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHG 420 

I I I I I I I : I I : II I I II I I : I I I I II I I I I : I I : I I I I I I I I I I I I I I I I I III 
Db 361 WKAEAKSLDTGTYAVSQTLTQDTNCGTAAELPGMIQQFTTLIRRQISNDFRDLPTLFIHG 420 

Qy 421 SEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYYE 480 

: I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I : I I II 
Db 421 AEACLMSLIIGFLYYGHADKPLSFMDMAALLFMIGALIPFNVILDWSKCHSERSLLYYE 480 

Qy 481 LEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLWFC 540 

I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I : I I I I I I 
Db 481 LEDGLYTAGPYFFAKVLGELPEHCAYVIIYGMPIYWLTNLRPGPELFLLHFMLLWLWFC 540 

Qy 541 CRT^4ALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFS 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I : I I I I I I I I 
Db 541 CRTMALAASAMLPTFHMS SFCCNALYNS FYLTAGFMINLNNLWIVPAWI SKMS FLRWCFS 600 

Qy 601 GLMQIQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYLS 660 

I I I I I I I I I I : I I I I I I I III: II I :: I I I I I I I I I I I I I I I I I I I I III I I I I I 

Db 601 GLMQIQFNGHIYTTQIGNLTFSVPGDAMVTAMDLNSHPLYAIYLIVIGISCGFLSLYYLS 660 

Qy 661 LKLI KQKS I QDW 672 

II I I I I I I I I I 

Db 661 LKFIKQKSIQDW 672 



RESULT 
Q9C6W5 
ID 
AC 
DT 
DT 
DT 
DE 



GN 
OS 



Q9C6W5 PRELIMINARY; PRT; 64 8 AA. 

Q9C6W5; 

01-JUN-2001 (TrEMBLrel. 17, 
01-JUN-2001 (TrEMBLrel. 17, 
01-OCT-2003 (TrEMBLrel. 25, 

Hypothetical protein (ABC transporter, putative) . 
F27M3_2 OR AT1G31770/F27M3_2 . 
Arabidopsis thaliana (Mouse-ear cress) . 



Created) 

Last sequence update) 
Last annotation update) 



OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; 

OC Spermatophyta; Magnoliophyta; eudi cotyledons ; core eudicots; rosids; 

OC eurosids II; Brassicales; Brassicaceae; Arabidopsis. 

OX NCBI_TaxID=3702 ; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN^cv. Columbia; 

RX MEDLINE=21016719; PubMed=11130712; 

RA Theologis A., Ecker J.R., Palm C.J., Federspiel N.A., Kaul S., 

RA White 0., Alonso J., Altafi H., Araujo R., Bowman C.L., Brooks S.Y., 

RA Buehler E., Chan A., Chao Q. , Chen H. , Cheuk R.F., Chin C.W., 

RA Chung M.K., Conn L., Conway A.B., Conway A.R., Creasy T.H., Dewar K. , 

RA Dunn P., Etgu P., Feldblyum T.V., Feng J.-D., Fong B., Fujii C.Y., 

RA Gill J.E., Goldsmith A.D., Haas B. , Hansen N.F., Hughes B., Huizar L., 

RA Hunter J.L., Jenkins J., Johnson-Hopson C. , Khan S., Khaykin E., 

RA Kim C.J., Koo H.L., Kremenetskaia I., Kurtz D.B., Kwan A., Lam B., 

RA Langin-Hooper S., Lee A., Lee J.M., Lenz C.A., Li J.H., Li Y.-P., 

RA Lin X., Liu S.X., Liu Z.A. , Luros J.S., Maiti R. , Marziali A., 

RA Militscher J., Miranda M. , Nguyen M. , Nierman W.C., Osborne B.I., 

RA Pai G., Peterson J., Pham P.K., Rizzo M., Rooney T . , Rowley D., 

RA Sakano H., Salzerg S.L., Schwartz J.R. f Shinn P., Southwick A.M., 

RA Sun H., Tallon L.J., Tambunga G. f Toriumi M.J., Town CD., 

RA Utterback T. , Van Aken S. r Vaysberg M. , Vysotskaia V.S., Walker M. , 

RA Wu D . , Yu G., Fraser CM. , Venter J.C, Davis R.W. ; 

RT "Sequence and analysis of chromosome 1 of the plant Arabidopsis 

RT thaliana."; 

RL Nature 408:816-820(2000). 

RN [2] 

RP SEQUENCE FROM N.A. 

RA Haas B.J., Volfovsky N., Town C.D. r Troukhan M. , Alexandrov N., 

RA Feldmann K.A., Flavell R.B., White O., Salzberg S.L.; 

RT "Full-length messenger RNA sequences greatly improve genome 

RT annotation . " ; 

RL Genome Biol. .0:0-0(2002). 

RN [3] 

RP SEQUENCE FROM N.A. 

RA Brover V., Troukhan M. , Alexandrov N. , Lu Y.-P., Flavell R. , 

RA Feldmann K. ; 

RT "Full-Length cDNA from Arabidopsis thaliana."; 

RL Submitted (MAR-2002) to the EMBL/ GenBank/DDBJ databases. 

RN [4] 

RP SEQUENCE FROM N.A. 

RC STRAIN^cv. Columbia; 

RA Seki M. , Iida K. , Satou M. , Sakurai T., Akiyama K. , Ishida J., 

RA Nakajima M. , Enju A., Kamiya A., Narusaka M. , Carninci P., Kawai J., 

RA Hayashizaki Y., Shinozaki K. ; 

RT "Arabidopsis thaliana full-length cDNA. " ; 

RL Submitted (NOV-2002) to the EMBL/ GenBank/DDBJ databases. 

DR EMBL; AC074360; AAG60152.1; -. 

DR EMBL; AY088793; AAM67104.1; 

DR EMBL; AK117530; BAC42192.1; 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO:0006810; P:transport; IEA. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC tran; 1. 



DR ProDom; PD000006; ABC_transporter ; 1. 

DR PROSITE; PS00211; ABC_TRANSPORTER_l; 1. 

DR PROSITE; PS50893; ABC_TRANSPORTER_2 ; 1. 

KW Hypothetical protein. 

SQ SEQUENCE 648 AA; 72618 MW; D52A2D2434A5BB9D CRC64; 



Query Match 22.4%; Score 782; DB 10; Length 648; 

Best Local Similarity 31.1%; Pred. No. 1.7e-50; 

Matches 216; Conservative 130; Mismatches 2 66; Indels 82; Gaps 22; 

Qy 1 MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYF-TYSGQSN TLEVRD 51 

: I : : I : I : : I I I I : : : I : I I I : I II:: 
Db 6 IAPRPEED GGVMVQ GLPD-MSDTQSKSVLAFPTITSQPGLQMSMYPITLKFEE 57 

Qy 52 LTYQVDIASQVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGR 111 

: I : I I I I : I I : I : : : : I I : I I : : I I I I : 

Db 58 VVYKVKI EQTSQCMGSWKSKE KTILNGITGMVCPGEFLAMLGPSGSGK 105 

Qy 112 ASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIA 171 

: I I : I I I II:: MM: : I : I I I M : I M II I I I 
Db 106 TTLLSALGGR — LSKTFSGKVMYNGQPFSGCIKRR-TGFVAQDDVLYPHLTVWETLFFTA 162 

Qy 172 QMRL P RT F S QAQ RD KRVE DVI AEL RLRQ CANT RVGNT YVRGVS GG E RRRVS I GVQ L LWN P 231 

: I M : : : : : : I : I I I I I I : I M : I M : I I I I : : I I I M : : I M 

Db 163 LLRLPSSLTRDEKAEHVDRVIAELGLNRCTNSMIGGPLFRGISGGEKKRVSIGQEMLINP 222 

Qy 232 GI LI LDEPTSGLDS FTAHNLVTTLSRLAKGNRLVLI SLHQPRSDI FRLFDLVLLMTSGTP 291 

: I : I M I II M II III MM: III I I I : : M II I I : Ml MM: Ml 
Db 223 SLLLLDEPTSGLDSTTAHRIWTIKRLASGGRTVVTTIHQPSSRIYHMFDKVVLLSEGSP 282 

Qy 2 92 IYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLTS 1 D RRS KE RE VAT VE KAQ S LA 34 6 

Mill I M I : I : I I I I I Ml: : : I : I I I : : M 

Db 283 IYYGAASSAVEYFSSLGFSTSLTVNPADLLLDLANGIPPDTQKETSEQEQKTVK--ETLV 34 0 

Qy 347 ALFLEKVQGFDDFLWKAEAKELNTSTHTVSLT LTQDTDCGTAVELPGMIEQFST 400 

: : : : I M : I M : I I : I I I I : 

Db 341 SAYEKNIS TKLKAELCNAESHSYEYTKAAAKNLKSEQWCTT WWYQFTV 38 8 

Qy 401 LIRRQI-SNDFRDLPTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALIP 459 

MM: I II : :| I : I I I I I II 

Db 389 LLQRGVRERRFESFNKLRIF QVI S VAFLGGLLWWHT PKS -HI QDRTALLFFFSVFWG 444 

Qy 460 FNVILDWSKCHSERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTN 519 

I : : I I : II I Ml II I : : M I I I : MM 

Db 445 FYPLYNAVFTFPQEKRMLIKERSSGMYRLSSYFMARNVGDLPLELALPTAFVFIIYWMGG 504 

Qy 520 LRPVPELFLLHFLLVWLWFCCRTMALAASAMLPTFHMS S FFCNALYNS FYLTAGFMINL 579 

MM Ml Ml I : : I I Ml : : : I : M : 

Db 505 LKPDPTTFILSLLWLYSVLVAQGLGLAFGALLMNIKQATTLASVTTLVFLIAGGYYVQQ 564 

Qy 580 DNLWIVPAWISKLSFLRWCFSGLMQIQFNGHLY TTQI GNFT FS I LGDTMI SAM 632 

: I I M II : : M I : I M I : : M I I : I 

Db 565 IPPFIV— WLKYLSYSYYCYKLLLGIQYTDDDYYECSKGVWCRVGDF PAIKSM 615 

Qy 633 DLNSHPLYAIYLIVIGIS-YGFLFLYYLSLKLIK 665 

IM I : MM M : MM M 

Db 616 GLNN LW I DVFVMGVMLVG YRLMAYMALHRVK 64 6 



RESULT 6 
Q9C6R7 

ID Q9C6R7 PRELIMINARY; PRT; 646 AA. 

AC Q9C6R7; 

DT 01-JUN-2001 (TrEMBLrel. 17, Created) 

DT 01-JUN-2001 (TrEMBLrel. 17, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ABC transporter, putative. 

GN F5M6.22. 

OS Arabidopsis thaliana (Mouse-ear cress) . 

OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; 

OC Spermatophyta; Magnoliophyta ; eudicotyledons ; core eudicots; rosids; 

OC eurosids II; Brassicales; Brassicaceae; Arabidopsis. 

OX NCBI_TaxID=3702 ; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=cv. Columbia; 

RX MEDLINE=2 1016719; PubMed=l 1130712; 

RA Theologis A., Ecker J.R., Palm C.J., Federspiel N.A., Kaul S., 

RA White O., Alonso J., Altafi H., Araujo R. , Bowman C.L., Brooks S.Y., 

RA Buehler E . , Chan A., Chao Q. , Chen H., Cheuk R.F., Chin C.W., 

RA Chung M.K., Conn L., Conway A.B., Conway A.R., Creasy T.H., Dewar K., 

RA Dunn P., Etgu P., Feldblyum T.V., Feng J.-D., Fong B., Fujii C.Y., 

RA Gill J.E., Goldsmith A.D., Haas B., Hansen N.F., Hughes B., Huizar L. , 

RA Hunter J.L., Jenkins J., Johnson-Hopson C, Khan S., Khaykin E. , 

RA Kim C.J., Koo H.L., Kremenetskaia I., Kurtz D.B., Kwan A., Lam B., 

RA Langin-Hooper S., Lee A., Lee J.M., Lenz C.A., Li J.H., Li Y.-P., 

RA Lin X., Liu S.X., Liu Z.A., Luros J.S., Maiti R. , Marziali A., 

RA Militscher J., Miranda M. , Nguyen M. , Nierman W.C., Osborne B.I., 

RA Pai G., Peterson J., Pham P.K., Rizzo M., Rooney T., Rowley D., 

RA Sakano H., Salzerg S.L., Schwartz J.R., Shinn P., Southwick A.M., 

RA Sun H., Tallon L.J., Tambunga G. f Toriumi M.J., Town CD., 

RA Utterback T., Van Aken S., Vaysberg M. , Vysotskaia V.S., Walker M. , 

RA Wu D., Yu G., Fraser CM. , Venter J.C, Davis R.W. ; 

RT "Sequence and analysis of chromosome 1 of the plant Arabidopsis 

RT thaliana."; 

RL Nature 408:816-820(2000). 

DR EMBL; AC079041; AAG50724.1; -. 

DR PIR; C86441; C86441. 

DR GO; GO: 0016020; Cmembrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F : ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0000166; F: nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase . 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC_TRANSPORTER_l ; 1. 

DR PROSITE; PS50893; ABC_TRANSPORTER_2 ; 1. 

KW ATP-binding. 

SQ SEQUENCE 646 AA; 72342 MW; 7A9624F82FD88A6E CRC64; 



Query Match 



22.2%; Score 777; DB 10; Length 646; 



Best Local Similarity 30.9%; Pred. No. 3.9e-50; 

Matches 214; Conservative 132; Mismatches 266; Indels 80; Gaps 22; 



Qy 1 MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYF-TYSGQSN TLEVRDLT 53 

:| : :|: I ::| I I I : : : I : I I I : I : :::: 

Db 6 IAPRPEED GGVMVQ GLPD-MSDTQSKSVLAFPTITSQPGLQMSMYPITLKEW 57 

Qy 54 YQVDIASQVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRAS 113 

I : I I I I : I I : I : : : : I I : I I : : I I I I : : 

Db 58 YKVKI EQTSQCMGSWKSKE KTILNGITGMVCPGEFLAMLGPSGSGKTT 105 

Qy 114 LLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQM 173 

II : II I II:: III : : |: I I I I I : I I I I I I I I : 

Db 106 LLSALGGR — LSKTFSGKVMYNGQPFSGCIKRR-TGFVAQDDVLYPHLTVWETLFFTALL 162 

Qy 174 RL P RT F S QAQ RDKRVE DVI AE LRL RQ CANT RVGNT YVRGVS GGE RRRVS I GVQLLWN P GI 233 

Ml : : : : : : I : I I I I I I : I I : : I I I : I I I I : : I I I I I : : I II : 

Db 163 RLPSSLTRDEKAEHVDRVIAELGLNRCTNSMIGGPLFRGISGGEKKRVSIGQEMLINPSL 222 

Qy 234 LILDEPTSGLDSFTAHNLWTLSRL7VKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIY 293 

I : I I I I I I I I I I III : I I I : III I I I : -III I I : : I I I : I I : I I I 
Db 223 LLLDEPTSGLDSTTAHRIVTTIKRLASGGRTVVTTIHQPSSRIYHMFDKWLLSEGSPIY 2 82 

Qy 2 94 LGAAQQMVQ Y FT SIGHPCPRYSN PAD FYVD LT S 1 D RRS KE REVAT VE KAQ S LAAL 348 

III I : I I : I : I I I I I : I I : : : I : I I I : —I : 

Db 283 YGAASSAVEYFSSLGFSTSLTWPADLLLDLANGIPPDTQKETSEQEQKTVK — ETLVSA 340 

Qy 349 FLEKVQGFDDFLWKAEAKELNTSTHTVSLT LTQDTDCGTAVELPGMIEQFSTLI 402 

: : : I : I : I : I : I I : I I I I : I : 

Db 341 YEKNIS TKLKAELCNAESHSYEYTKAAAKNLKSEQWCTT WWYQFTVLL 388 

Qy 403 RRQI-SNDFRDLPTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALIPFN 461 

: I : I II : : I I : I I I I I I I I 

Db 389 QRGVRERRFES FNKLRI F QVISVAFLGGLLWWHTPKS-HIQDRTALLFFFSVFWGFY 444 

Qy 462 VILDWSKCHSERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLR 521 

: : I I : I I I I : I I I I : : I : I I I : III: I : 

Db 445 PLYNAVFTFPQEKRMLIKERSSGMYRLSSYFMARNVGDLPLELALPTAFVFIIYWMGGLK 504 

Qy 522 PVPELFLLHFLLVWLVVFCCRTMALAASAMLPTFHMS S FFCNALYNS FYLTAGFMINLDN 581 

I I I : I I : I I : : I I I : I : : : | : | : : 

Db 505 PDPTTFILS LLWLYSVLVAQGLGLAFGALLMNI KQATTIAS WTLVFLIAGGYYVQQI P 564 

Qy 582 LWIVPAWISKLSFLRWCFSGLMQIQFNGHLY TTQI GNFTFS I LGDTMI SAMDL 634 

: I I I : ||: : I : I : I I : I : : I : I I : I I 

Db 565 PFIV — WLKYLSYSYYCYKLLLGIQYTDDDYYECSKGVWCRVGDF PAIKSMGL 615 

Qy 635 NSHPLYAIYLIVIGIS-YGFLFLYYLSLKLIK 665 

I : I : I : I : I : : I : : I : I 

Db 616 NN LWIDVFVMGVMLVGYRLMAYMALHRVK 644 



RESULT 7 
Q9ARU4 

ID Q9ARU4 PRELIMINARY; PRT; 668 AA. 

AC Q9ARU4 ; 

DT 01-JUN-2001 (TrEMBLrel. 17, Created) 



DT 01-JUN-2001 (TrEMBLrel. 17, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Putative ABC transporter. 

GN P0445D12.3. 

OS Oryza sativa (Rice) . 

OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; 

OC Spermatophyta; Magnoliophyta; Liliopsida; Poales; Poaceae; 

OC Ehrhartoideae; Oryzeae; Oryza. 

OX NCBI_TaxID=4530; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=cv. Nipponbare; 

RA Sasaki T., Matsumoto T., Yamamoto K. ; 

RT "Oryza sativa nipponbare (GA3 ) genomic DNA, chromosome 1, PAC 

RT clone : P0445D12. "; 

RL Submitted (DEC-2000) to the EMBL/ GenBank/DDBJ databases. 

CC -!- SIMILARITY: BELONGS TO THE ABC TRANSPORTER FAMILY. 

DR EMBL; AP003046; BAB40032.1; -. 

DR Gramene; Q9ARU4; -. 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F : ATP-binding cassette (ABC) transporter acti . . .; IEA. 

DR GO; GO: 0000166; F:nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC_TRANSPORTER__l ; 1. 

DR PROSITE; PS50893; AB C_T RAN S PORT E R_2 ; 1. 

KW ATP-binding; Transport. 

SQ SEQUENCE 668 AA; 73368 MW; D1875B8C75B0F3B2 CRC64; 

Query Match 21.6%; Score 756; DB 10; Length 668; 

Best Local Similarity 30.8%; Pred. No. 1.6e-48; 

Matches 183; Conservative 120; Mismatches 248; Indels 44; Gaps 9; 

Qy 88 IRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR — GHGGKMKSGQIWINGQPSTPQLVR 145 

: I I : : I I : : I I : : I I I I : : I I : I : I : I I : : I II : I : : 

Db 77 LSNASGEAKSGRLLALMGPSGSGKTTLLNVLAGQLTASPSLHLSGFLYINGRPISEGGYK 136 

Qy 14 6 KCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRV 205 

: I : I I I I I I I I I I I : I : : : I I I : : : : I I : : II I I : : I 

Db 137 — IAYVRQEDLFFSQLTVRETLSLAAELQLRRTLTPERKESYVNDLLFRLGLVNCADSIV 194 

Qy 2 06 GNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLV 265 

I : I I I : I I I I : : I : I : : I : : I I : I I I I : I I I : I I : : II : I I : I 
Db 195 GDAKVRGISGGEKKRLSIACELIASPSIIFAI5EPTTGLDAFQAEKVMETLRQLAEDGHTV 254 

Qy 266 LISLHQPRSDIFRLFDLVLLMTSGTPIYLG-AAQQMVQYFTSIGHPCPRYSNPADFYVDL 324 

: I : I I I I : : I I : : I :: I I I : I I : : : I I I : I : I I : I I I : I II 
Db 255 ICSIHQPRGSVYGKFDDIVLLSEGEVIYMGPAKEEPLLYFASLGYHCPDHVNPAEFLADL 314 

Qy 325 TSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLTLTQDTD 384 

hill Ill : : : : I III: 

Db 315 ISVDYSSAESVQSSRKRI ENLIEEFSNKVA ITESNSSLTNPEGSEFSPKLIQKS- 368 



Qy 385 CGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGFLYYGHGAKQLSF 444 

I I I I I : I Mill: : : : I I : : : I I I 

Db 369 --TTKHRRGWWRQFRLLFKRAWMQAFRDGPTNKVRARMSVASAIIFGSVFWRMGKTQTSI 426 

Qy 445 MDTAALLFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAGPYFFAKILGELPEHC 504 

I II : : I II::: I I I III :|:| |:| 

Db 427 QDRMGLLQVTAINTAMAALTKTVGVFPKERAIVDRERAKGSYALGPYLSSKLLAEIPIGA 486 

Qy 505 AWIIYAMPIYWLTNLRPVPELFLLHFLLWLWFCCRTMALAASAMLPTFHMSSFFCNA 564 

I : : I : : I : : I I I : I : I I I MM : : 

Db 4 87 AFPLIFGSILYPMSKLHPTFSRFAKFCGIWVESFAASAMGLTVGAMAPTTEAAMALGPS 54 6 

Qy 565 LYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQIQFNG HLYTTQIGN 618 

I I : I : : I I I : : I I I : I : I I I I I Ml II II 

Db 547 LMTVFIVFGGYYVNPDNTPVIFRWIPKVSLIRWAFQGLCINEFKGLQFEQQHSYDIQTGE 606 

Qy 619 FT FSI LGDTMI SAMDLNSHPLYAI YLI VI GI S YGFLFLYYLSLKLI KQ 666 

II: : II::: : : I I : I : I : I : 

Db 607 QALERFSLGGIRIADTLVAQGRI LMFWYWLTYLLLKK 643 



RESULT 8 
Q9ZU35 

ID Q9ZU35 PRELIMINARY; PRT; 725 AA. 

AC Q9ZU35; 

DT 01-MAY-1999 (TrEMBLrel. 10, Created) 

DT 01-MAY-1999 (TrEMBLrel. 10, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Putative ABC transporter. 

GN AT2G01320. 

OS Arabidopsis thaliana (Mouse-ear cress). 

OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; 

OC Spermatophyta; Magnoliophyta; eudicotyledons ; core eudicots; rosids; 

OC eurosids II; Brassicales; Brassicaceae; Arabidopsis. 

OX NCBI_TaxID=3702; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=cv. Columbia; 

RX MEDLINE=20083487; PubMed=10617197 ; 

RA Lin X., Kaul S., Rounsley S.D., Shea T.P., Benito M.-I., Town CD., 

RA Fujii C.Y., Mason T.M., Bowman C.L., Barnstead M.E., Feldblyum T.V., 

RA Buell C.R., Ketchum K.A. , Lee J. J., Ronning CM., Koo H., Moffat K.S., 

RA Cronin L.A., Shen M., VanAken S.E., Umayam L . , Tallon L.J., Gill J.E., 

RA Adams M.D., Carrera A. J., Creasy T.H., Goodman H.M., Somerville C.R., 

RA Copenhaver G.P., Preuss D . , Nierman W.C, White O., Eisen J. A., 

RA Salzberg S.L., Fraser CM. , Venter J.C; 

RT "Sequence and analysis of chromosome 2 of the plant Arabidopsis 

RT thaliana."; 

RL Nature 402:761-768(1999). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN=cv. Columbia; 

RA Lin X. ; 

RL Submitted (MAR-2000) to the EMBL/GenBank/DDBJ databases. 

CC -!- SIMILARITY: BELONGS TO THE ABC TRANSPORTER FAMILY. 

DR EMBL; AC006200; AAD14532.1; -. 



DR PIR; C84423; C84423. 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO:0004009; F: ATP-binding cassette (ABC) transporter acti. . . ; IEA. 

DR GO; GO: 0000166; F: nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA__ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC_TRANSPORTER_l ; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

KW ATP-binding; Transport. 

SQ SEQUENCE 725 AA; 78899 MW; 7DB2E556FE3553D7 CRC64 ; 



Query Match 21.5%; Score 749.5; DB 10; Length 725; 

Best Local Similarity 29.2%; Pred. No. 5.5e-48; 

Matches 180; Conservative 128; Mismatches 260; Indels 49; Gaps 10; 

Qy 73 IPWR SHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 120 

III I I I : : I : I : : I : : I I I : I I I I : : I I : I : I 

Db 70 IRWRNITCSLSDKSSKSVl^FLLKNVSGEAKPGRLLAIMGPSGSGKTTLLNVIiAGQLSLSP 129 

Qy 121 RGHGGKMKS GQ I W I NGQ P S T PQLVRKCVAHVRQHDQLL PNLT VRET LAFI AQMRL PRT FS 180 

II I I : : I I : I I : : : : I I I I I I II I I I I : I I : : : I I I 

Db 130 RLH LS GLLEVNGKPS S SKAYK — LAFVRQEDLFFSQLTVRETLS FAAELQLPEI S S 183 

Qy 181 QAQ RD KRVE DVI AE LRL RQ CANT RVGNT YVRGVS GGERRRVS I GVQL LWN P G I L I LD E P T 240 



Db 184 AEERDEYVNNLLLKLGLVS CADS CVGDAKVRGI S GGEKKRLS LACELI AS P S VI FADEPT 243 

Qy 241 SGLDS FTAHNLVTT LS RLAKGNRLVLI S LHQPRS DI FRLFDLVLLMTS GTP I YLG-AAQQ 299 

: I I I : I I : : I I : I I : I : I : I I I I : : I I : : I : I I I : I I I : : 

Db 244 TGLDAFQAEKVMETLQKLAQDGHTVICSIHQPRGSVYAKFDDIVLLTEGTLVYAGPAGKE 303 

Qy 300 MVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDF 359 

: I I : I I I : I I I : I I I I : I II : : : : : I I 
Db 304 PLTYFGNFGFLCPEHWPAEFLADLISVDYSSSETVYSSQKRV11ALVDAF 353 

Qy 360 LWKAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIE QFSTLIRRQISNDFRD 412 

:: :: : I:: ::| I ::| II |::| II 

Db 354 SQRSSSVLYATPLSMKEETKNGMRPRRKAIVERTDGWWRQFFLLLKRAWMQASRD 4 08 

Qy 413 LPTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTATOiLFMIGALIPFNVILDWSKCHS 472 

II: : ::| I ::: I I I I I I : : I 

Db 409 GPTNKVRARMSVASAVI FGSVFWRMGKSQTSIQDRMGLLQVAAINTAMAALTKTVGVFPK 468 

Qy 473 ERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFL 532 

II::: I I I : I I I : I : I : I I : : : : : I : II I 
Db 469 ERAIVDRERSKGSYSLGPYLLSKTIAEIPIGAAFPLMFGAVLYPMARLNPTLSRFGKFCG 528 

Qy 533 LVWLWFCCRTMALAASAMLPT FHMS S FFCNAL YNS FYLTAGFMINLDNLWI VPAWI S KL 592 

: I : I II I I : I : : : I I : I : : I I I I : I I : 

Db 529 IVTVES FAASAMGLTVGAMVPSTEAAMAVGPSLMTVFI VFGGYYVNADNTPI I FRWI PRA 588 



QY 



593 SFLRWCFSGLMQI QFNGHLYTTQI GNFTFS I -LGDTMI SAMDLNSHPLYAI YLI VI GI S Y 651 



Db 589 SLIRWAFQGLCINEFSGLKFDHQ NTFDVQTGEQALERLSFGGRRIRE TIAAQS 641 

Qy 652 GFLFLYYLSLKLIKQKS 668 

I : I : I : : I : 
Db 642 RI LMFWYSAT YLLLEKN 658 



RESULT 9 
Q9ASR9 

ID Q9ASR9 PRELIMINARY; PRT; 725 AA. 

AC Q9ASR9; 

DT 01-JUN-2001 (TrEMBLrel. 17 f Created) 

DT 01-JUN-2001 (TrEMBLrel. 17 , Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE At2g01320/F10A8.20. 

OS Arabidopsis thaliana (Mouse-ear cress) . 

OC Eukaryota; Viridiplantae ; Streptophyta; Embryophyta; Tracheophyta; 

OC Spermatophyta; Magnoliophyta; eudi cotyledons ; core eudicots; rosids; 

OC eurosids II; Brassicales; Brassicaceae; Arabidopsis. 

OX NCBI_TaxID=3702; 

RN [1] 

RP SEQUENCE FROM N . A. 

RA Cheuk R. , Chen H., Kim C.J., Meyers M.C., Shinn P., Banh J., 

RA Bowser L., Carninci P., Chung M.K., Goldsmith A.D., Hayashizaki Y., 

RA Ishida J., Jones T-, Kamiya A., Karlin-Neumann G., Kawai J., Lam B., 

RA Lee J.M., Lin J., Liu S.X., Miranda M. , Narusaka M. r Nguyen M. , 

RA Palm C.J., Pham P.K., Quach H.L., Sakano H., Sakurai T . , Satou M. , 

RA Seki M. , Southwick A., Toriumi M. , Yamada K., Yu G., Shinozaki K., 

RA Davis R.W. , Theologis A., Ecker J.R.; 

RT "Arabidopsis cDNA clones."; 

RL Submitted (MAR-2001) to the EMBL/GenBank/DDB J databases. 

RN [2] 

RP SEQUENCE FROM N.A. 

RA Cheuk R. , Chen H., Kim C.J., Shinn P., Banh J. , Bowser L., 

RA Carninci P., Chang E., Dale J.M., Goldsmith A.D., Hayashizaki Y., 

RA Ishida J., Jones T., Kamiya A., Karlin-Neumann G. , Kawai J., Lam B., 

RA Lee J.M., Lin J., Miranda M. , Narusaka M. , Nguyen M. , Onodera C.S., 

RA Palm C.J., Quach H.L., Sakurai T., Satou M., Seki M. , Southwick A., 

RA Tang C.C., Toriumi M. , Wu H.C., Yamada K., Yamamura Y., Yu G. f Yu S., 

RA Shinozaki K. , Davis R.W., Theologis A., Ecker J.R.; 

RT "Arabidopsis ORF clones."; 

RL Submitted (JUL-2002) to the EMBL/GenBank/DDB J databases. 

CC -!- SIMILARITY: BELONGS TO THE ABC TRANSPORTER FAMILY. 

DR EMBL; AF367318; AAK32 905.1; -. 

DR EMBL; AY133617; AAM91447.1; 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR PROSITE; PS00211; ABC_TRANSPORTER_l ; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

KW ATP-binding; Transport. 

SQ SEQUENCE 725 AA; 78998 MW; 68A7E556FE2FE3D7 CRC64; 



Query Match 21.5%; Score 749.5; DB 10; Length 725; 

Best Local Similarity 29.2%; Pred. No. 5.5e-48; 

Matches 180; Conservative 128; Mismatches 260; Indels 49; Gaps 10; 

Qy 73 IPWR SH S SQDS CELGI RNLS FKVRS GQMLAI I GS S GCGRAS LLDVI TG 120 

III I I I : : | : | : : | : : | | | : | | | | : : M : I : I 

Db 70 IRWRNITCSLSDKSSKSVT^FLLKNVSGEAKPGRLLAIMGPSGSGKTTLLNVLAGQLSLSP 129 

Qy 121 RGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFS 180 

II I I : : I I : I I : : : : I I I I I I I I I I I I : I I : : : I I I 

Db 130 RLH LSGLLEVNGKPSSSKAYK— LAFVRQEDLFFSQLTVRETLSFAAELQLPEISS 183 

Qy 181 QAQ RD KRVE DVI AE L RL RQ CANT RVGNT YVRGVS GGE RRRVS I GVQL LWN PGILILDEPT 240 

: I I : I : : : : I I I I : : I I : I I I : I I I I : : I : I : : I : : I : : I I I I 
Db 184 AEERDEYVNNLLLKLGLVSCADSCVGDT^KVRGISGGEKKRLSLACELIASPSVIFADEPT 243 

Qy 241 SGLDSFTAHNLWTLSRLAKGNRLVT.ISLHQPRSDIFRLFDLVTiLMTSGTPIYLG-AAQQ 299 

: I I I : I I : : I I : II : I : I : I I I I : : I I : : I : I I I : I I I : : 

Db 244 TGLDAFQAEKV>IETLQKLAQDGHTVICSIHQPRGSWAKFDDI VTjLTEGTLVTAGPAGKE 303 

Qy 300 MVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDF 359 

: I I : I I I : I I I : I I I I : I II : : : : : I I 
Db 304 PLTYFGNFGFLCPEHWPAEF1AI)LISVDYSSSETWSSQKRWALVDAF 353 

Qy 360 LWKAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIE QFSTLIRRQISNDFRD 412 

: : : : : I : : : : I I : : I I I I : : I II 

Db 354 SQRSSSVLYATPLSMKEETKNGMRPRRKAIVERTDGWWRQFFLLLKRAWMQASRD 408 

Qy 413 LPTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHS 472 

II: : : : I I : : : I I I I I I : : I 

Db 409 GPTNKVRARMSVASAVI FGSVFWRMGKSQTSIQDRMGLLQVAAINTAMAALTKTVGVFPK 4 68 

Qy 473 ERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFL 532 

II::: I I I : I I I : I : I : I I : : : : : I : II I 
Db 469 ERAIVDRERSKGSYSLGPYLLSKTIAEIPIG7^AFPLMFGAVLYPMARLNPTLSRFGKFCG 528 

Qy 533 LVWLWFCCRTMALAASAMLPT FHMS S FFCNAL YNS FYLTAGFMINLDNLWI VPAWI S KL 592 

: I : I II I I : I : : : I I : I : : I I I I : I I : 

Db 529 I VT VE S FAAS AMGLTVGAMVP S T EAAMAVGP S LMT VFI VFGG Y YVNADNT P 1 1 FRWI P RA 588 

Qy 593 S FLRWCFSGLMQIQFNGHLYTTQIGNFTFS I -LGDTMI SAMDLNSHPLYAI YLI VI GI S Y 651 

I : I I I I I : I : I : I I I : I : : : : I 
Db 589 SLIRWAFQGLCINEFSGLKFDHQ NTFDVQTGEQALERLSFGGRRIRE TIAAQS 641 

Qy 652 GFLFLYYLSLKLIKQKS 668 

I :| : I: :|: 
Db 642 RI LMFWYSAT YLLLEKN 658 



RESULT 10 
Q9LI82 

ID Q9LI82 PRELIMINARY; PRT; 672 AA. 

AC Q9LI82; 

DT 01-OCT-2000 (TrEMBLrel. 15, Created) 

DT 01-OCT-2000 (TrEMBLrel. 15, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 



DE ABC transporter-like protein. 

OS Arabidopsis thaliana (Mouse-ear cress) . 

OC Eukaryota; Viridiplantae; Streptophyta; Erabryophyta; Tracheophyta; 

OC Spermatophyta; Magnoliophyta; eudicotyledons ; core eudicots; rosids; 

OC eurosids II; Brassicales; Brassicaceae; Arabidopsis. 

OX NCBI_TaxID=3702 ; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Columbia; 

RA Kaneko T., Kato T., Sato S., Nakamura Y., Asamizu E., Tabata S.; 

RL Submitted (MAR-2000) to the EMBL/ GenBank/DDBJ databases. 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Columbia; 

RX MEDLINE-20363099; PubMed=10907853 ; 

RA Nakamura Y. ; 

RT "Structural analysis of Arabidopsis thaliana chromosome 3. II. 

RT Sequence features of the regions of 4,251,695 bp covered by ninety PI, 

RT TAC and BAC clones. "; 

RL DNA Res. 7:217-221(2000). 

DR EMBL; AP001313; BAB03081.1; -. 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO:0005524; F: ATP binding; IEA. 

DR GO; GO:0004009; F: ATP-binding cassette (ABC) transporter . acti . . .; IEA. 

DR GO; GO: 0000166; F:nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR006162; Ppantne_S . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; 1. 

DR PROSITE; PS50893; ABCJT RAN S PORT ER_2 ; 1. 

DR PROSITE; PS00012; PHOSPHO PANTETHEINE; 1. 

KW ATP-binding. 

SQ SEQUENCE 672 AA; 75269 MW; 2 0B2D992 15600135 CRC64; 



Query Match 20.3%; Score 710; DB 10; Length 672; 

Best Local Similarity 28.6%; Pred. No. 4.8e-45; 

Matches 209; Conservative 120; Mismatches 233; Indels 170; Gaps 25; 

Qy 19 DASGLQDSLFSSES DNSLYFTYSGQSN TLEVRDLTYQ 55 

: I : I :: I I I I II :. I I : I : : I I I 

Db 21 ETSPVQENRFSSPSHVNPCLDDDND HDGPSHQSRQSSVLRQSLRPIILKFEELTY- 75 

Qy 56 VDIASQVP WFEQLAQFKIPWRSHSSQD SCELGIRNLSFKVRSGQMLAI 103 

III II II: III I : I :: I I : 

Db 7 6 -SIKSQTGKGSYWF GSQEPKPNRLVLKCVSGI VKPGELLAM 115 

Qy 104 IGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTV 163 

: I I I I : : I : : I I I I I : II : I I : I I I I I I hill 

Db 116 LGP S GS GKTTL VTALAGRLQG — KLSGTVSYNGEPFTSSVKRK-TGFVTQDDVLYPHLTV 172 

Qy 164 RETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSI 223 

I | | : | : I I I : : : : : : : I I I : : : I I : I I : : I : I I : I I I I I : I I I I 
Db 173 METLTYTALLRLPKELTRKEKLEQVEMWSDLGLTRCCNSVIGGGLIRGISGGERKRVSI 232 



Qy 224 GVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDIFRLFDLV 283 

I :: I I I : I : I I I I I I I I I I I I : I I I I I : I I I : : : I I I I : : I : I I I 
Db 233 GQEMLVNPSLLLLDEPTSGLDSTTAARIVATLRSLARGGRTWTTIHQPSSRLYRMFDKV 292 

Qy 284 LLMTSGTPIYLGAAQQMVQYFTSIGH-PCPRYSNPADFYVDLTS 326 

I : : : I III I : : z : : I I III: I : II I I I : I I : 
Db 293 LVLSEGCPIYSGDSGRVMEYFGSIGYQPGSSFWPADFVLDLANGITSDTKQYDQIETNG 352 

Qy 327 -IDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLTLTQDTDC 385 

: I I :: I III : : I : : I I I I I I 

Db 353 RLDRLEEQNSV KQSL ISSYKKNLYPPLKEE VSRTFPQDQ — 391 

Qy 386 GTAVEL PGMIEQFSTLIRRQIS NDFRDLPTLLIHGSEACLMSLIIG 431 

I I I II I : : I : II : : : I I : I 

Db 392 -TNARLRKKAITNRWPTSWWMQFSVLLKRGLKERSHESFSGLRIFMVMS VSLLSG 445 

Qy 432 FLYYGHGAKQLSFMDTT^JVLLFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAGPY 491 

I : : I I III I : : : I I I I I I : I I 

Db 446 LLWWHSRVAHL — QDQVGLLFFFSIFWGFFPLFNAI FTFPQERPMLIKERSSGIYRLSSY 503 

Qy 492 FFAKILGELPEHCAWIIYAMPIYWLTNLRPVPELFLLHFLLWLWFCCRTMALAASAM 551 

: I : : I : I I I : I I : I : I I : : : : I I : : I I I : 

Db 504 YIARTVGDLPMELILPTIFVTITYWMGGLKPSLTTFIMTIKIVLYNAALVAQGVGIAIjGAI 563 

Qy 552 LPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVP AWISKLSFLRWCFSGLMQIQFN 608 

I :: : I II I : I : I I I : : I I : I : I : : I : 

Db 564 LMDAKKAATLSSVLMLVFLLAGGYYIQ HIPGFIAWLKYVSFSHYCYKLLVGVQYT 618 

Qy 609 GH . LYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGIS 650 

: : I I I : : I I : I : : 
Db 619 WDEVYECGS GLHCS VMDYEGI KNLRI GNMMWDVLA LAVMLLL 660 

Qy 651 YGFLFLYYLSLK 662 

: I I I : I : 
Db 661 — YRVLAYLALR 670 



RESULT 11 
Q7TSR8 

ID Q7TSR8 PRELIMINARY; PRT; 652 AA. 

AC Q7TSR8; 

DT 01-OCT-2003 (TrEMBLrel. 25, Created) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ATP-binding cassette sub-family G member 5. 

GN ABCG5 . 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=I/LnJ; TISSUE=Liver ; 

RA Wittenburg H., Lyons M.A., Li R. , Churchill G.A. , Carey M.C., 

RA Paigen B. ; 

RT "Primary Roles of FXR and ABCG5/ABCG8 in Cholesterol Gallstone 



RT Susceptibility: Evidence from a Cross of PERA/Ei and I/Ln Inbred 

RT Mice."; 

RL Submitted (DEC-2002) to the EMBL/ GenBank/DDBJ databases. 

DR EMBL; AY195872; AAO45093.1; -. 

KW ATP-binding. 

SQ SEQUENCE 652 AA; 73236 MW; 0125FB617DE296B9 CRC64; 



Query Match 20.1%; Score 702.5; DB 11; Length 652; 

Best Local Similarity 29.6%; Pred. No. 1.7e-44; 

Matches 196; Conservative 125; Mismatches 253; Indels 89; Gaps 18; 

Qy 24 QDSLFSSESDNS L Y FT YS GQ SNT LEVRDLT YQVD I AS QV- PWFEQLAQ FKI PWRSH S 79 

I I : :| : : I I : : I I : : : : I II I I 

Db 27 QGSVTGTEARHSLGVLHVSYS VSNRVGPW WNIKS 60 

Qy 80 SQDSCELGI-RNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQP 138 

I : I : : : I : I I I : : I : I I I I I : : I I I I : I I I : : : I I 

Db 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRCTGTLEGDVFVNGCE 120 

Qy 139 STPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLR 198 

: I : : I I I I : I I I I I I I : I : I I : I : | : | | | : I I I 
Db 121 LRRDQFQDCFSYvliQSDVFLSSLTVRETLRYTAMLALCRS-SADFYNKKV^VNTELSLS 17 9 

Qy 199 QCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRL 258 

I : : I : I : I I I I I I I I I I I I : I : : : I I I I I : I I I I I : : I hi 
Db 180 HVADQVIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAEL 239 

Qy 259 AKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPA 318 

I: :|:|::::|||||::|: II : ::| I :: I : : I : : I : I : I I I : I I I 
Db 240 ARRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPF 299 

Qy 319 DFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLT 378 

I I I : I I I I : I : I : I I I : I : : I I : I I : I I : II 

Db 300 DFYMDLTSVDTQSREREIETYKRVQMLESAFKES-DIYHKILENIERARYLKTLPTVPFK 358 

Qy 379 LTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGF — LYYG 436 

|:| III : 1:11 I |: ::: : :| I : I I 

Db 359 -TKDP PGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQ 409 

Qy 437 HGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAGPYFFAKI 496 

: : : I II : I : : I : I : I : : I : I I I I I : 

Db 410 NNT L KGAVQD RVGL L YQ FVGAT P YT GMLN AVN LF PMLRAVS DQE S Q D G L YH KWQMLLAYV 469 

Qy 497 LGELPEHCAYVIIYAMPIYWLTNLRPVPELF LL — HFLLVWLWFCCRTMALAA 548 

III : I : : II II I I I I : : I hi 
Db '470 LHALPFS 1 1 ATVI FS SVCYWTLGLYPEVARFGYFSAALLAPHLI GEFL TLVLLG 523 

Qy 549 SAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQIQFN 608 



Db 524 IVQNPNI-VNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILVWEFY 582 

Qy 609 GHLYTTQI GNFTFS I LGDTMI SAMDLNSHPLYAI YLIVT GI S Y GFL 654 

I III I h : | | : | | | : : II 
Db 583 GL NFTCGESNTTML NHPMCA ITQGVEFIEKTCPGATSRFTANFL 626 



Qy 



655 FLY 657 
I I 



Db 627 ILY 629 



RESULT 12 
Q9NH94 

ID Q9NH94 PRELIMINARY; PRT; 687 AA. 

AC Q9NH94; 

DT Ol-OCT-2000 (TrEMBLrel. 15, Created) 

DT Ol-OCT-2000 (TrEMBLrel. 15, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ATP dependent transmembrane transporter protein. 

GN WH3 . 

OS Bombyx mori (Silk moth) . 

OC Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; 

OC Neoptera; Endopterygota ; Lepidoptera; Glossata; Ditrysia; Bombycoidea; 

OC Bombycidae; Bombyx. 

OX NCBI_TaxID=7091; 

RN [1] 

RP SEQUENCE FROM N . A. 

RC STRAIN=Kin-Shiu X Sho-wa; 

RX MEDLINE=20469043; PubMed=11016828 ; 

RA Abraham E.G., Sezutsu H . , Kanda T., Sugasaki T., Shimada T., 

RA Tamura T. ; 

RT "Identification and characterization of a silkworm ABC transporter 

RT gene homologous to Drosophila white."; 

RL Mol. Gen. Genet. 264:11-19(2000). 

CC -!- SIMILARITY: BELONGS TO THE ABC TRANSPORTER FAMILY. 

DR EMBL; AF229609; AAF61569.1; -. 

DR GO; GO: 0016021; C: integral to membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F : ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0000166; F:nucleotide binding; IEA. 

DR GO; GO:0006810; P:transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase . 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR005284; P igment_per mease . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter; 1. 

DR SMART; SM003 82; AAA; 1. 

DR TIGRFAMs; TIGR00955; 3a01204; 1. 

DR PROSITE; PS00211; ABC_TRANSPORTER_l ; 1. 

DR PROSITE; PS50893; ABC_TRANSPORTER_2 ; 1. 

KW ATP-binding; Transmembrane; Transport. 

SQ SEQUENCE 687 AA; 75835 MW; ECD336333F0981AB CRC64; 

Query Match 19.9%; Score 695.5; DB 5; Length 687; 
Best Local Similarity 29.2%; Pred. No. 6.2e-44; 

Matches 179; Conservative 119; Mismatches 269; Indels 45; Gaps 10; 

Qy 75 WRSHSS QDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQ 131 

I : : I I : I : I I : : I : : I I I : I I I I I : : I I : : I I I I : : I 

Db 88 WKNSSDRMFQQRKQL-LRNVNGAAYPGELLAIMGSSGAGKTTLLNTLTFRTPGGWATGT 146 

Qy 132 IWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDV 191 

: I I I I : I I : I : I : I I : I I I I I I I I : I : I M I I I I : : I 

Db 147 RALNGQPATPDALTALSAYVQQQDLFIGTLTVREHLVFQAMVRMDRHIPYAQRMKRVQEV 206 



Qy 192 I AE L RL RQ CANT RVG- N T YVRGVS GGE RRRVS I GVQ L LWN PGILILDEPTSGLDS FT AHN 250 

111:111: ::|:|||| :|:| ::| :| :: II I I I I I I I I I I 

Db 207 IQELALSKCQNTVIGIPGRLKGISGGEMKRLSFASEVLTDPPLMFCDEPTSGLDSFMAQN 266 

Qy 251 LVTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHP 310 

: : I ||: : I : : : I M I : : : : I I : I : I I : I I : : : I : I : I 
Db 267 VIQVLKGLAQKGKTWCTIHQPSSELYAMFDKLLIMADGRVAFLGSSDEAFQFFKELGAA 326 

Qy 311 CPRYSNPADFYVDLTS IDRRS KEREVATVEKAQSLAALFLE-KVQ 354 

M I I I I :: I : : I : : I: I : : I I : I I : I 

Db 327 CPANYNPADHFIQLLAGVPGREEVTRHTIDTVCTAFAKSEIGCRIAAEAENALYNERKIQ 386 

Qy 355 -GFDDFLWKAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDL 413 

III :: | | : | | :: | : :: 

Db 387 AGLADAPW AMSSTTRAGRSPYKASWCTQFRAVLWRSWLSVTKEP 430 

Qy 414 PTLLIHGS EACLMS LI I GFL YYGHGAKQLS FMDTAALLFMI GALI P FNVI LDWS KCHS E 473 

: : : : : I : : I I : I : I I I : : I I : I I I : : II 

Db 431 MLIKVRFLQTIMVSILIGVIYFGQNLDQDGVMNINGAIFMFLTNMTFQNIFAVINVFCSE 490 

Qy 474 RSMLYYELEDGLYTAGPYFFAKILGELPEHCAWIIYAMPIYWLTNLRPVPELFLLHFLL 533 

: I |:| I II :| I I I |:: I I h I : I 

Db 491 LPIFIREHHSGMYRADVYFLSKTLAEAPVFATIPLVFTTIAYYMIGLNPDPKRFFIASGL 550 

Qy 534 WLWFCCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLW 593 

II : I : I : : : I I I I : I : I I : : I : I 
Db 551 AALVTNVAT S FG YL I S CAS S S VSMAAS VGP P HIP FML FGG FFLN S G S VPPYLSWIS 607 

Qy 594 FLRWCFSGLMQIQFNGHLYTTQIG N FT FS I LGDTMI S AMDLN S H P L YAI YL I VI GI 649 

: I I I : I I III | : : : : : : I : 

Db 608 YLSWFHYGNEALLINQWAGVETIACTRENFTCPASGQWLETLSFSQDDFAMDWNMILL 667 

Qy 650 SYGFLFLYYLSL 661 

II II I I : I 
Db 668 FVGFRFLAYLAL 679 



RESULT 13 
Q8T691 

ID Q8T691 PRELIMINARY; PRT; 801 AA. 

AC Q8T691; 

DT 01-JUN-2002 (TrEMBLrel. 21, Created) 

DT 01-JUN-2002 (TrEMBLrel. 21, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ABC transporter AbcGl. 

GN ABCGl. 

OS Dictyostelium discoideum (Slime mold) . 

OC Eukaryota; Mycetozoa; Dictyosteliida ; Dictyostelium. 

OX NCBI_TaxID=44689; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-Ax4 ; 

RA Anjard C, Loomis W.F.; 

RT "Evolution of the ABC transporters of Dictyostelium."; 

RL Submitted (FEB-2002) to the EMBL/ GenBank/DDBJ databases. 

CC -!- SIMILARITY: BELONGS TO THE ABC TRANSPORTER FAMILY. 

DR EMBL; AF482380; AAL91485.1; 



DR GO; GO: 0016020; C : membrane; TEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0000166; F:nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS 002 11; ABC_TRANSPORTER_l ; 1. 

DR PROSITE; PS50893; ABC_TRANSPORTER_2 ; 1. 

KW ATP-binding; Transport. 

SQ SEQUENCE 801 AA; 90052 MW; CCC4F0036CB195A3 CRC64; 

Query Match 19.9%; Score 695; DB 5; Length 801; 

Best Local Similarity 29.5%; Pred. No. 8.3e-44; 

Matches 195; Conservative 123; Mismatches 242; Indels 102; Gaps 21; 

Qy 88 IRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKMKSGQIWINGQPSTPQLVRK 146 

: I : : : I I : I I : I I I I : : I I I : : I I I I : : : I I I : : I 
Db 139 LTNINGHIESGTIFAIMGPSGAGKTTLLDILAHRLNING S GTMYLNGNKS D FN I FKK 195 

Qy 147 CVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRVG 206 

: I I I I : I : I I I I I I I I I I : : : I I : : : I I : I : I I : I : I I : I II 

\ Db 196 LCGYVTQSDSLMPSLTVRETLNFYAQLKMPRDVPLKEKLQRVQDIIDEMGLNRCADTLVG 255 

Qy 207 — NTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRL 264 

: : I I : I I I I I I I I : I : : I I I : : : I I I I I I I I I : I : : : : I : I M I 
Db 256 TADNKIRGISGGERRRVTISIELLTGPSVILLDEPTSGLDASTSFYVMSALKKLAKSGRT 315 

Qy 265 VLISLHQPRSDIFRLFDLVXLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDL 324 

: : : : I I I I I : I : : I I : I I : I I I I I : : : I I : I : I : I I I I I : : I I 
Db 316 1 1 CTI HQPRSNI YDMFDNLLLLGDGNT I YYGKANKALE YFNANGYHCS EKTNPADFFLDL 375 

Qy 325 TSI D RRS KE RE VAT VE KAQ S LAAL FL E KV QGFDDF 359 

: I : I I : I : I : I : 

Db 376 INTQVEDQADSDDDDYNDEEEEI GGGGGGSGGGAGGI EDI GI S I SPTMNGSAVDNI KNNE 435 

Qy 360 LWKAE AKELNTSTHTVSLTLTQD T 383 

III II I : : I I I 

Db 436 LKQQQQQQQQQQQSTDGRARRRIKKLTKEEMVILKKEYPNSEQGLRVNETLDNISKENRT 495 

Qy 384 DCG-TAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGFLYYGHGAKQL 442 

I 'I : I I I I : I ::: I I :: : I I : I : I I I I 

Db 496 DFKYEKTRGPNFLTQFSLLLGREVTNAKRHPMAFKVNLIQAI FQGLLCGIVYYQLGLGQS 555 

Qy 443 SFMD-TAALLFMI-GALIP FNVI LDWS KCHS ERSML YYELEDGLYTAGP YFFAK 495 

I I : I : I I I : I I I :: : I : I : I hill 

Db 556 S VQ S RT GWAF 1 1 MGVS F P AVMS T I HVF P D VI T I FL KD RA SGVYDTLPFFLAK 608 

Qy 496 ILGELPEHCAYVI IYAMPI YWLTNLRPVP ELFLLHFLLVWLWFCCRTMALAA 548 

: I I : : I : I I : I I I I I I : : I I : : : 

Db 609 SEm>ACIAVLLPMVTATIVYW]^^ 665 

Qy 549 SAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNL — WIVPAWISKLSFLRWCFSGLMQIQ 606 

|: :| : : : |:| :|| III::: |:| I :|| |: : 



Db 



666 SSSVPNVQVGTAVAPLIVILFFLFSGFFINLNDVPGWLV — WFPYISFFRYMIEAAVTNA 723 



Qy 



607 FNGHLYT TQIGNFTFSILGDTMISAMDLN-SHPLYAIYLIVIGISYGFLFLYYLSL 661 



Db 



724 FKDVH FT CT D S QKI GGVC P VQ YGNNVI ENMG YDI DH FWRNVWI LVL YI - 1 GFRVLT FLVL 782 



Qy 



662 KL 



663 



Db 



783 KL 



784 



RESULT 14 

Q80W57 

ID Q80W57 



PRELIMINARY; 



PRT; 



657 AA. 



AC Q8 0W57; 

DT 01-JUN-2003 (TrEMBLrel. 24, Created) 

DT 01-JUN-2003 (TrEMBLrel. 24, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ABC transporter ABCG2 . 

GN ABCG2 . 

OS Rattus norvegicus (Rat) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Rattus. 

OX NCBI_TaxID=10116; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=wistar; 

RA Hori S., Ohtsuki S., Terasaki T.; 

RT "Expression and regulation of ABCG2 at the rat blood-brain barrier."; 

RL Submitted (MAR-2003) to the EMBL/GenBank/DDB J databases. 

DR EMBL; AB105817; BAC76396.1; -. 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO:0004009; F: ATP-binding cassette (ABC) transporter acti . . .; IEA. 

DR GO; GO: 0000166; F:nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABCJzransporter . 

DR InterPro; IPR006162; Ppantne_S . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS508 93; AB C_T RAN S PORT ER_2 ; 1. 

DR PROSITE; PS00012; PHOSPHO PANTETHEINE; 1. 

SQ SEQUENCE 657 AA; 72960 MW; C975C61A08489027 CRC64; 

Query Match 19.4%; Score 677.5; DB 11; Length 657; 
Best Local Similarity 28.4%; Pred. No. 1.3e-42; 

Matches 181; Conservative 130; Mismatches 251; Indels 75; Gaps 16; 

Qy 91 LSF KVRSGQML AIIGSSGCGRASLLDVITGRG 122 

II! I t : I I :: II : I : I I : : M I I I : I 

Db 37 LSFHHITYRVKWSGFLWKT7VEKEILSDINGIMKPGLNAILGPTGGGKSSLLDVLAAR- 95 

Qy 123 HGGKMKSGQIWINGQPSTPQLVRKCVA-HVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQ 181 

: I I : I I I I M : : I I I : : I I II I I I I : I I I : 

Db 96 KDPRGLSGDVLINGAPQPANF--KCSSGYWQDDWMGTLTVRENLQFSAALRLPKAMKT 153 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



182 AQRDKRVEDVI AELRLRQCANTRVGNT YVRGVS GGERRRVS I GVQLLWNPGI LI LDEPT S 241 

: : : : I : : I II I : I : : : I I : I I : I I I I I : I I I I : : I : : I II Mill: 
154 HEKNERINTIIKELGLDKVADSKVGTQFTRGISGGERKRTSIGMELITDPSILFLDEPTT 213 

242 GLDS FTAHNLVTT LS RLAKGNRLVLI S LHQP RS DI FRLFDLVLLMT S GT P I YLGAAQQMV 301 

I I I I I I : : : I l-l I : : I : I I I I I I : I I I : I : I I : : I I I : : 
214 GLDSSTANAVLLLLKRMSKQGRTIIFSIHQPRYSIFKLFDSLTLLASGKLMFHGPAQKAL 273 

302 QYFTSIGHPCPRYSNPADFYVDLTS IDRRSKEREVATVEKAQSLAALFLEKVQ 354 

: I I I I : I I : I I I II : : I : : : : I : : I I : : I : 

274 EYFASAGYHCEPYNNPADFFLDVINGDSSAVMLNRGEQDHEANKTEEPSKREKPIIENLA 333 

355 GF — DDFLWKAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRD 412 

I : : : I I : I : I : I I : I : I I : : 

334 EFYINSTIYGETKAELD QLPVAQKKKGS SAFREPVYVTS FCHQLRWIARRS FKN 387 

413 L PTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALIPFNVIL 464 

I I : : : : I I I I I I : I : I : I : I : 

388 LLGNPQASVAQLIV TVILGLIIGALYFGLKNDPTGMQNRAGVFFFLTTNQCFTSV- 442 

465 DWSKCHSERSMLYYELEDGLYTAGPYFFAKILGE-LPEHCAYVIIYAMPIYWLTNLRPV 523 

I I : : : I II I I I I : : : I I : I I : I : : I : 

443 SAVELFWEKKLFIHEYISGYYRVSSYFFGKLVSDLLPMRFLPSVIYTCILYFMLGLKRT 502 

524 PELFLLHFLLWLWFCCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNL- 582 

II: : : I : : I I I I : I : : : : I : I : : I I : 

503 VEAFFIMMFTLIMVAYTASSMALAIAAGQSWSVATLL^4TISFVFMMLFSGLLWLRTI^ 562 

583 -WIVPAWISKLSFLRWCFSGLMQIQFNGH LYTTQIGNFTFSILGDTMIS-A 631 

I : : I : I I : I : I : I I : : | : : : | | : | : 

563 PWL--SWLQYFSIPRYGFTALQHNEFLGQEFCPGLNVTMNSTCVNSYTICTGNDYLINQG 620 

632 MDLNSHPLYAIYLIVIGISYGFLFLYYLSLKLIKQKS 668 

: I I : I : : : : : I I : I I I : I : I 
621 I DLS PWGLWRNHVALACMI 1 1 FLTI AYLKLLFLKKYS 657 



RESULT 15 
Q80ST1 



ID 
AC 
DT 
DT 
DT 
DE 
DE 
DE 
GN 
OS 
OC 
OC 
OX 
RN 
RP 
RC 
RA 



Q80ST1 PRELIMINARY; PRT; 657 AA. 

Q80ST1; 

01-JUN-2003 (TrEMBLrel. 24, Created) 

01-JUN-2003 (TrEMBLrel. 24, Last sequence update) 

01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

ATP-binding cassette protein G2 transcript variant B (ATP-binding 

cassette protein G2 transcript variant C) (ATP-binding cassette 

protein G2 transcript variant A) . 

ABCG2 . 

Rattus norvegicus (Rat) . 
Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Rodent ia; 
NCBI_TaxID-10116; 
[1] 

SEQUENCE FROM N.A. 

STRAIN=Sprague-Dawley; TISSUE=Liver ; 
Yabuuchi H., Ishikawa T.; 



Craniata; Vertebrata; Euteleostomi ; 
Sciurognathi; Muridae; Murinae; Rattus. 



RL Submitted (MAR-2002) to the EMBL/ GenBank/DDBJ databases. 

DR EMBL; AY089996; AAM09106.1; -. 

DR EMBL; AY089997; AAM09107.1; 

DR EMBL; AY089998; AAM09108.1; 

DR GO; GO: 0016020; C: membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO:0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0000166; F:nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR006162; Ppantne_S . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

DR PROSITE; PS00012; PHOSPHOPANTETHEINE; 1. 

KW ATP-binding. 

SQ SEQUENCE 657 AA; 72960 MW; E19487 1E1C1AC2 01 CRC64; 



Query Match 19.4%; Score 677.5; DB 11; Length 657; 

Best Local Similarity 28.4%; Pred. No. 1.3e-42; 

Matches 181; Conservative 130; Mismatches 251; Indels 75; Gaps 16; 

Qy 91 LSF KVRSGQML AI I GS SGCGRASLLDVITGRG 122 

III I I : I I : : I I : I : I I : : I I I I I : I 

Db 37 LSFHHITYRVKVKSGFLVRKT7VEKEILSDINGIMKPGLNAILGPTGGGKSSLLDVLAAR- 95 

Qy 123 HGGKMKSGQIWINGQPSTPQLVRKCVA-HVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQ 181 

: II : I I I I I I : : I I I : : I I I I I I I I : I I I : 

Db 96 KDPRGLSGDVLINGAPQPANF — KC S S G YWQDDWMGT LT VRENLQ FS AALRL P KAMKT 153 

Qy 182 AQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTS 241 

::::!: : I II I : I : : : I I : I I : I I I I I : I I I I : : I : : I II Mill: 
Db 154 HEKNERINTIIKELGLDKVADSKVGTQFTRGISGGERKRTSIGMELITDPSILFLDEPTT 213 

Qy 242 GLDSFTAHNLVTTLSRLAKGNRLVlilSLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMV 301 

I I I I I I : : : I I : : I I : : I : I I I I I I : I I I : I : I I : : I I I : : 
Db 214 GLDSSTANAVLLLLKRMSKQGRTIIFSIHQPRYSIFKLFDSLTLLASGKLMFHGPAQKAL 273 

Qy 302 QYFTSIGHPCPRYSNPADFYVDLTS 1 D RRS KE REVAT VE KAQ S LAAL FL EKVQ 354 

: I I I I : I | : I I I I I : : I : : : : I : : I I : : I : 

Db 274 EYFASAGYHCEPYNNPADFFLDVINGDSSAVMLNRGEQDHEANKTEEPSKREKPIIENLA 333 

Qy 355 GF— DDFLWKAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRD 412 

I : :: I I : I : I : I I : I : I I :: 

Db 334 EFYINSTIYGETKAELD QLPVAQKKKGSSAFREPVYVTSFCHQLRWIARRSFKN 387 

Qy 413 L PTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALIPFNVIL 464 

I I : : : : I I I I I I : I : | : | : | : 

Db 388 LLGNPQASVAQLIV TVILGLIIGALYFGLKNDPTGMQNRAGVFFFLTTNQCFTSV- 442 

Qy 4 65 DWSKCHSERSMLYYELEDGLYTAGPYFFAKILGE-LPEHCAYVIIYAMPIYWLTNLRPV 523 

I I : : : I II I I I I : : : I I : I I : I : : I : 

Db 443 SAVELFWEKKLFIHEYISGYYRVSSYFFGKLVSDLLPMRFLPSVIYTCLLYFMLGLKRT 502 



Qy 



524 PELFLLHFLLVWLWFCCRTMALAASAMLPTFHMS S FFCNALYNS FYLTAGFMINLDNL- 582 



Db 503 VEAFFIMMFTLI^AYTASSMALM 562 

Qy 583 -WIVPAWISKLSFLRWCFSGLMQIQFNGH LYTTQIGNFTFSILGDTMIS-A 631 

I : : I : I I : I : I : I I : : I : :: I I : I : 

Db 563 PWL--SWLQYFSIPRYGFTALQHNEFLGQEFCPGLNVTMNSTCVNSYTICTGNDYLINQG 620 

Qy 632 MDLNSHPLYAIYLIVIGISYGFLFLYYLSLKLIKQKS 668 

: I I : I : : : : : I I : I I I : I : I 
Db 621 IDLSPWGLWRNHVALACMI I IFLTIAYLKLLFLKKYS 657 



Search completed: February 27, 2004, 07:15:27 
Job time : 39.3051 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search, using sw model 
Run on : 



Title: 

Perfect score: 
Sequence: 



February 27, 2004, 06:40:43 ; Search time 10.4048 Seconds 

(without alignments) 
3362.970 Million cell updates/sec 

US-09-989-981A-4 
3494 

1 MAEKTKEETQLWNGTVLQDA FLFLYYLSLKLIKQKSIQDW 672 



Scoring table: BLOSUM62 

Gapop 10.0 , Gapext 0.5 

Searched: 141681 seqs, 52070155 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



141681 



Database 



SwissProt 42:* 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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ALIGNMENTS 



RESULT 1 
ABG8_M0USE 

ID ABG8_M0USE STANDARD; PRT; 673 AA. 

AC Q9DBM0; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 28-FEB-2003 (Rel. 41, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 8 (Sterolin-2) . 

GN ABCG8 . 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. (IS0F0RMS 1 AND 2). 

RC STRAIN=C57BL/6; TISSUE=Liver ; 

RX MEDLINE-21344600; PubMed-11452359 ; 

RA Lu K., Lee M.-H., Hazard S., Brooks-Wilson A., Hidaka H., Kojima H., 

RA Ose L., Stalenhoef A.F.H., Mietinnen T., Bjorkhem I., Bruckert E. , 

RA Pandya A., Brewer H.B. Jr., Salen G. , Dean M. , Srivastava A.K., 

RA Patel S.B. ; 

RT "Two genes that map to the STSL locus cause sitosterolemia : genomic 

RT structure and spectrum of mutations involving sterolin-1 and 

RT sterolin-2, encoded by ABCG5 and ABCG8, respectively."; 



RL Am. J. Hum. Genet. 69:278-290(2001). 

RN [2] 

RP SEQUENCE FROM N.A. (ISOFORM 1) . 

RC STRAIN=C57BL/6J; TISSUE=Liver ; 

RX MEDLINE=21085660; PubMed=11217851; 

RA Kawai J., Shinagawa A. , Shibata K., Yoshino M. , Itoh M., Ishii Y. , 

RA Arakawa T . , Hara A., Fukunishi Y. , Konno H. , Adachi J., Fukuda S., 

RA Aizawa K. , Izawa M. , Nishi K. , Kiyosawa H., Kondo S., Yamanaka I., 

RA Saito T . , Okazaki Y. , Gojobori T., Bono H., Kasukawa T., Saito R. , 

RA Kadota K. , Matsuda H.A. , Ashburner M. , Batalov S., Casavant T., 

RA Fleischmann W., Gaasterland T., Gissi C, King B., Kochiwa H. , 

RA Kuehl P., Lewis S., Matsuo Y. , Nikaido I., Pesole G., Quackenbush J., 

RA Schriml L.M. , Staubli F. , Suzuki R. , Tomita M. , Wagner L., Washio T . , 

RA Sakai K., Okido T., Furuno M. , Aono H., Baldarelli R. , Barsh G . , 

RA Blake J., Boffelli D., Bojunga N . , Carninci P., de Bonaldo M.F., 

RA Brownstein M.J., Bult C, Fletcher C w Fujita M. , Gariboldi M. , ' 

RA Gustincich S., Hill D. f Hofmann M. f Hume D.A., Kamiya M. , Lee N.H., 

RA Lyons P., Marchionni L. f Mashima J., Mazzarelli J., Mombaerts P., 

RA Nordone P., Ring B., Ringwald M. , Rodriguez I., Sakamoto N . , 

RA Sasaki H . , Sato K., Schoenbach C. f Seya T w Shibata Y., Storch K.-F., 

RA Suzuki H., Toyo-oka K. f Wang K.H., Weitz C. , Whittaker C, Wilming L. , 

RA Wynshaw-Boris A., Yoshida K., Hasegawa Y. f Kawaji H., Kohtsuki S. f 

RA Hayashizaki Y.; 

RT "Functional annotation of a full-length mouse cDNA collection."; 

RL Nature 409:685-690(2001). 

RN [3] 

RP TISSUE SPECIFICITY, AND INDUCTION. 

RX MEDLINE=20553648; PubMed=110994 17 ; 

RA Berge K.E., Tian H., Graf G.A. , Yu L., Grishin N.V., Schultz J. r 

RA Kwiterovich P. f Shan B. r Barnes R. , Hobbs H.H.; 

RT "Accumulation of dietary cholesterol in sitosterolemia caused by 

RT mutations in adjacent ABC transporters."; 

RL Science 290:1771,-1775(2000). 

CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG5 or be tightly coupled to 
CC ABCG5 along a pathway regulating diatery-s terol absorption and 

CC excretion (By similarity) . 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- ALTERNATIVE PRODUCTS: 

CC Event=Alternative splicing; Named isoforms=2; 

CC Name=l; 

CC IsoId=Q9DBM0-l ; Sequence^Displayed; 

CC Name=2 ; 

CC IsoId-Q9DBM0-2; Sequence=VSP_000053; 

CC Note=No experimental confirmation available; 

CC -!- TISSUE SPECIFICITY: Expressed in the intestine and, at lower 

CC level, in the liver. 

CC -!- INDUCTION: Upregulated by cholesterol feeding. Possibly mediated 
CC by the liver X receptor/retinoide X receptor (LXR/RXR) pathway. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC -!- CAUTION: Seems to have a defective ATP-binding region. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 



CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; AF324495; AAK84079.1; -. 

DR EMBL; AK004871; BAB23630.1; 

DR MGD; MGI : 1914720; Abcg8 . 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR PROSITE; PS00211; ABC_TRANSP0RTER_1 ; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER__2 ; 1. 

KW Glycoprotein; Transmembrane; Transport; Alternative splicing. 
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/FTId=VSP__000053. 


SQ 


SEQUENCE 


673 AA; 


75995 


MW; 78012611A5DF2589 CRC64; 



Query Match 99.7%; Score 3483.5; DB 1; Length 673; 

Best Local Similarity 99.9%; Pred. No. 4e-251; 

Matches 672; Conservative 0; Mismatches 0; Indels 1; Gaps 1; 

Qy 1 MAEKTKEETQLWNGTVLQDAS-GLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIA 59 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 MAEKTKEETQLWNGTVLQDASQGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIA 60 

Qy 60 SQVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVIT 119 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I 
Db 61 SQVPWFEQLAQFKI PWRSHS SQDSCELGI RNLS FKVRSGQMLAI I GS SGCGRASLLDVIT 120 

Qy 120 GRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTF 179 



Db 



121 




180 



180 



SQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEP 



239 



Db. 



181 




240 



Qy 



240 



T S GLDS FT AHNLVTT L S RLAKGNRLVL I S LHQ P RS D I FRL FDLVLLMT S GT P I YLGAAQQ 



299 




Db 



241 



T S GLDS FT AHNLVTT L S RLAKGNRLVL I S LHQ P RS D I FRL FDLVLLMT S GT P I YLGAAQQ 



300 



Qy 300 MVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDF 359 

I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 1 I I I I I II I I I I I I I I I I I I 
Db 301 MVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDF 3 60 

Qy 360 LWKAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIH 419 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ' 
Db 361 LWKAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIH 420 

Qy 420 GS EACLMS LI I GFLYYGHGAKQLS FMDTAALLFMI GAL I P FNVI LDWS KCHS ERSMLYY 479 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
Db 421 GSEACLMSLI I GFLYYGHGAKQLS FMDTAALLFMIGALIPFNVI LDWS KCHSERSMLYY 480 

Qy 480 ELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLWF 539 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I 
Db 481 ELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLWF 540 

Qy 54 0 CCRTMALAASAMLPTFHMS S FFCNALYNSFYLTAGFMINLDNLWI VPAWI SKLS FLRWCF 599 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 541 CCRTMALAASAMLPT FHMS S FFCNALYNSFYLTAGFMINLDNLWI VP AW I SKLS FLRWCF 600 

Qy 600 SGLMQIQFNGHLYTTQIGNFTFSILGDTMI SAMDLNSHPLYAIYLIVIGISYGFLFLYYL 659 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 601 SGLMQIQFNGHLYTTQIGNFTFSILGDTMI SAMDLNSHPLYAIYLIVIGISYGFLFLYYL 660 

Qy 660 SLKLIKQKSIQDW 672 

I I I I I I I I I I I I I 
Db 661 SLKLIKQKSIQDW 673 



RESULT 2 
ABG8_RAT 

ID ABG8_RAT STANDARD; PRT; 694 AA. 

AC P58428; Q8CIQ5; Q923R7; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 15-MAR-2004 (Rel. 43, Last sequence update) 

DT 15-MAR-2004 (Rel. 43, Last annotation update) 

DE ATP-binding cassette, sub- family G, member 8 (Sterolin-2) . 

GN ABCG8 . 

OS Rattus noryegicus (Rat) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Rattus. 

OX NCBI_TaxID=10116; 

RN [1] 

RP SEQUENCE FROM N.A. (ISOFORMS 1 AND 2) . 

RC STRAIN=Sprague-Dawley; 

RX MEDLINE=21344600; PubMed=11452359; 

RA Lu K., Lee M.-H., Hazard S., Brooks-Wilson A., Hidaka H., Kojima H . , 

RA Ose L., Stalenhoef A.F.H., Mietinnen T., Bjorkhem I., Bruckert E., 

RA Pandya A., Brewer H.B. Jr., Salen G., Dean M. , Srivastava A.K., 

RA Patel S.B. ; 

RT "Two genes that map to the STSL locus cause sitosterolemia : genomic 

RT structure and spectrum of mutations involving sterolin-1 and 

RT sterolin-2, encoded by ABCG5 and ABCG8, respectively."; 

RL Am. J. Hum. Genet. 69:278-290(2001). 

RN [2] 

RP REVISIONS TO 3-4. 



RA Lu K., Yu H., Lee M.-H., Patel S.B.; 

RL Submitted (AUG-2002) to the EMBL/GenBank/DDBJ databases. 

RN [3] 

RP SEQUENCE FROM N.A. (ISOFORMS 1 AND 3), AND TISSUE SPECIFICITY. 

RC STRAIN=GH, SHR, SHRSP, Sprague-Dawley, Wistar, Wistar Kyoto, and WKA; 

RC TISSUE=Intestine, and Liver; 

RX PubMed-12783625; 

RA Yu H., Pandit B., Klett E., Lee M.-H., Lu K. , Helou K., Ikeda I., 

RA Egashira N., Sato M., Klein R., Batta A., Salen G. , Patel S.B.; 

RT "The rat STSL locus: characterization, chromosomal assignment, and 

RT genetic variations in sitosterolemic hypertensive rats."; 

RL BMC Cardiovasc. Disord. 3:4-4(2003). 

CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG5 or be tightly coupled to 
CC ABCG5 along a pathway regulating diatery-sterol absorption and 

CC excretion (By similarity) . 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- ALTERNATIVE PRODUCTS: 

CC Event=Alternative splicing; Named isoforms=3; 

CC Name=3; 

CC IsoId=P58428-3; Sequence=Displayed; 

CC Name=l; 

CC IsoId=P58428-l; Sequence=VSP_008767 ; 

CC Name=2 ; 

CC IsoId=P58428-2; Sequence=VSP_008767 , VSP_000054; 

CC Note=No experimental confirmation available; 

CC -!- TISSUE SPECIFICITY: Highest expression in liver, with lower levels 
CC in small intestine and colon. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC -!- CAUTION: Seems to have a defective ATP-binding region. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; AF351785; AAK84831.2; -. 

DR EMBL; AY145899; AAN64276.1; -. 

DR EMBL; AF404109; AAK85393.1; -. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC_TRANSP0RTER_1 ; 1. 

DR PROSITE; PS50893; ABC_TRANSPORTER_2 ; 1. 

KW Glycoprotein; Transmembrane; Transport; Alternative splicing. 

FT DOMAIN 1 434 CYTOPLASMIC (POTENTIAL) . 

FT TRANSMEM 435 455 1 (POTENTIAL) . 

FT DOMAIN 456 468 EXTRACELLULAR (POTENTIAL) . 
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/FTId=VSP_008767 . 


FT 


VARSPLIC 


398 


398 


Missing (in isoform 2) . 


FT 








/FTId=VSP 000054. 


FT 


CONFLICT 


3 


4 


EK -> QT (IN REF. 3) . 


SQ 


SEQUENCE 


694 AA; 


78236 


MW; 67F67C195F417587 CRC64; 



Query Match 91.7%; Score 3204; DB 1; Length 694; 

Best Local Similarity 88.3%; Pred. No. 2.4e-230; 

Matches 613; Conservative 29; Mismatches 30; Indels 22; Gaps 1; 

1 MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQ 55 

I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I II I I I I I I I I I I II I I I I I I I I I I 
1 MAEKTKEETQLWNGTVLQDASSLQDSVFSSESDNSLYFTYSGQSNTLEVRDLTYQGGTCL 60 

56 vTDIASQVPWFEQI^QFKIPWRSHSSQDSCELGIRNLSFKVRSG 98 

I I : I II I I I I I I I I I II : I I I I I I I I : I I I I I I I I I II I I 
61 RSWGQEDPHMSLGLSESVDMASQVPWFEQLAQFKLPWRSRGSQDSWDLGIRNLSFKVRSG 120 

99 QMLAI I GS S GCGRAS LLDVI TGRGHGGKMKS GQIWINGQP ST PQLWKCVAHVRQHDQLL 158 
I I I I I I I I : I I I I I : I I I I I I I I I I I I I I II I I I I I I I I I I I I I : : I I I I I I I I I I I I 
121 QMLAIIGSAGCGRATLLDVITGRDHGGKMKSGQIWINGQPSTPQLIQKCVAHVRQQDQLL 180 

159 PNLTVRETljAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGER 218 

I I I I I I I I I I I I I I I I I : I I I I I I I I I I I M I I II I I II I I I I I I I I I I I I I I I I I I I I 
181 PNLTVRETLTFIAQMRLPKTFSQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGER 240 

219 RRVSI GVQLLWNPGI LILDEPTSGLDS FTAHNLVTTLSRLAKGNRLVLI SLHQPRSDI FR 278 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I II I I I I I I 
241 RRVS I GVQ L LWN PGILILDEPTSGLDS FT AHN L VRT L S RLAKGN RL VL ISLHQPRSDIFR 300 

279 LFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVAT 338 

I I I I I I I I II I II I I I I II I I II I I I II : I I I I I I I I I II I I I I I I I I I I I I I : I I I I 
301 LFDLVXLMTSGTPIYLGVAQHMVQYFTSIGYPCPRYSNPADFYVDLTSIDRRSKEQEVAT 360 

339 V^KAQSJjAALFLEKVQGFDDFLWKAKAJ^ELNTSTHTVSL 398 

: II I : II I I I I I I I I II I I I I I I I I I I I : I I : II I I I I I I : I I I I I I I II I : I I 
361 MEKARLLAALFLEKVQGFDDFLWKAFxAKSLDTGTYAVSQTLTQDTNCGTAAELPGMIQQF 420 

399 STLIRRQISNDFRDLPTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALI 458 

: I I I I I I I I II I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
421 TTLIRRQISNDFRDLPTLFIHGAEACLMSLIIGFLYYGHADKPLSFMDMAALLFMIGALI 480 

459 PFNVILDWSKCHSERSMLYYELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLT 518 

I I I I I I I I I I I I I II I I : I I I I I I I I M I I II II I I I : I I I I I I I I I I I I I I I I I I I I I 
481 PFNVILDVVSKCHSERSLLYYELEDGLYTAGPYFFAKVIiGELPEHCAYVIIYGMPIYWLT 540 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

- Db 

Qy 

Db 

Qy 
Db 



Qy 519 NLRPVPELFLLHFLLVWLWFCCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMIN 578 

I II I I I I I I I I I : I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
Db 541 NLRPGPELFLLHFMLLWLVVFCCRTMALAASAMLPTFHMSSFCCNALYNSFYLTAGFMIN 600 

Qy 579 LDNLWIVPAWISKLSFLRWCFSGLMQIQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHP 638 

I : I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I : I I I I I I I III: II I :: I I I I I I I I 
Db 601 LNNLWIVPAWISKMSFLRWCFSGLMQIQFNGHIYTTQIGNLTFSVPGDAMVTAMDLNSHP 660 

Qy 639 LYAIYLIVIGISYGFLFLYYLSLKLIKQKSIQDW 672 

I I I I I I I I I I I I III I I I I I I I MINIMI 
Db 661 LYAIYLIVIGISCGFLSLYYLSLKFIKQKSIQDW 694 



RESULT 3 
ABG8_HUMAN 

ID ABG8_HUMAN STANDARD; PRT; 673 AA. 

AC Q9H221; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 8 (Sterolin-2 ) . 

GN ABCG8 . 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI _TaxID=9 6 0 6 ; 

RN [1] 

RP SEQUENCE FROM N.A., VARIANTS S ITOSTEROLEMI A THR-231; GLN-263; ARG-574 

RP AND ARG-596, AND VARIANT CYS-54. 

RX MEDLINE=20553648; PubMed=11099417 ; 

RA Berge K.E., Tian H., Graf G.A., Yu L., Grishin N.V., Schultz J., 

RA Kwiterovich P., Shan B., Barnes R. , Hobbs H.H.; 

RT "Accumulation of dietary cholesterol in sitosterolemia caused by 

RT mutations in adjacent ABC transporters."; 

RL Science 290:1771-1775(2000). 

RN [2] 

RP SEQUENCE FROM N.A. (ISOFORMS 1 AND 2), VARIANTS SITOSTEROLEMIA 

RP HIS-184; THR-231; GLN-263; HIS-405; PRO-501; SER-543; PRO-572; 

RP GLU-574; ARG-574; ARG-596 AND PHE-570 DEL, AND VARIANTS HIS-19; 

RP CYS-54; LYS-238; VAL-259; LYS-400; ARG-575 AND ALA-632. 

RC TISSUE-Liver; 

RX MEDLINE=21344600; PubMed=11452359 ; 

RA Lu K., Lee M.-H., Hazard S., Brooks-Wilson A., Hidaka H., Kojima H., 

RA Ose L., Stalenhoef A.F.H., Mietinnen T., Bjorkhem I., Bruckert E., 

RA Pandya A., Brewer H.B. Jr., Salen G., Dean M. , Srivastava A.K., 

RA Patel S.B.; 

RT "Two genes that map to the STSL locus cause sitosterolemia: genomic 

RT structure and spectrum of mutations involving sterolin-1 and 

RT sterolin-2, encoded by ABCG5 and ABCG8, respectively."; 

RL Am. J. Hum. Genet. 69:278-290(2001). 

RN [3] 

RP REVIEW. 

RX MEDLINE=21474438; PubMed-11590207 ; 

RA Schmitz G., Langmann T., Heimerl S.; 

RT "Role of ABCG1 and other ABCG family members in lipid metabolism."; 

RL J. Lipid Res. 42:1513-1520(2001). 



CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG5 or be tightly coupled to 
CC ABCG5 along a pathway regulating diatery-sterol absorption and 

CC excretion. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- ALTERNATIVE PRODUCTS: 

CC Event=Alternative splicing; Named isoforms=2; 

CC Name=l; 

CC IsoId=Q9H221-l; Sequence=Displayed; 

CC Name=2 ; 

CC IsoId=Q9H221-2; Sequence=VSP_000052 ; 

CC Note=Minor form detected in approximately 10% of the cDNA 

CC clones ; 

CC -!- TISSUE SPECIFICITY: Strongly expressed in the liver, lower levels 
CC in the small intestine and colon. Detectable in a wide variety of 

CC human tissues. 

CC -!- DISEASE: Defects in ABCG8 are a cause of sitosterolemia 

CC [MIM: 210250] ; also known as phytosterolemia or shellfish 

CC sterolemia. It is a rare autosomal recessive disorder 

CC characterized by increased intestinal absorption of all sterols 

CC including cholesterol, plant and shellfish sterols, and decreased 

CC biliary excretion of dietary sterols into bile. Sitosterolemia 

CC patients have hypercholesterolemia, very high levels of plant 

CC sterols in the plasma, and frequently develop tendon and tuberous 

CC xanthomas, accelerated atherosclerosis and premature coronary 

CC artery disease. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC -!- CAUTION: Seems to have a defective ATP-binding region. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; AF320294; AAG40004.1; -. 

DR EMBL; AF324494; AAK84078.1; -. 

DR EMBL; AF351824; AAK84663.1; -. 

DR EMBL; AF351812; AAK84663.1; JOINED. 

DR EMBL; AF351813; AAK84663.1; JOINED. 

DR EMBL; AF351814; AAK84663.1; JOINED. 

DR EMBL; AF351815; AAK84663.1; JOINED. 

DR EMBL; AF351816; AAK84663.1; JOINED. 

DR EMBL; AF351817; AAK84663.1; JOINED. 

DR EMBL; AF351818; AAK84663.1; JOINED. 

DR EMBL; AF351819; AAK84663.1; JOINED. 

DR EMBL; AF351820; AAK84663.1; JOINED. 

DR EMBL; AF351821; AAK8 4663.1; JOINED. 

DR EMBL; AF351822; AAK84663.1; JOINED. 

DR EMBL; AF351823; AAK84663.1; JOINED. 

DR Genew; HGNC: 13887; ABCG 8 . 



DR 
DR 
DR 
DR 
DR 
DR 
DR 
KW 
KW 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 



MIM; 605460; -. 
MIM; 210250; -. 

InterPro; IPR003439; ABC_transporter . 
Pfam; PF00005; ABC_tran; 1. 
ProDom; PD000006; ABC_transporter ; 1. 
PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; 1. 
PROSITE; PS50893; ABCJT RAN S PORT ER_2 ; 1. 

Glycoprotein; Transmembrane; Transport; Alternative splicing; 
Polymorphism; Disease mutation. 



DOMAIN 

TRANSMEM 

DOMAIN 

TRANSMEM 
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VARIANT 
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CYTOPLASMIC (POTENTIAL) . 

1 (POTENTIAL) . 
EXTRACELLULAR (POTENTIAL) . 

2 (POTENTIAL) . 
CYTOPLASMIC (POTENTIAL) . 

3 ( POTENTIAL) . 
EXTRACELLULAR (POTENTIAL) . 

4 (POTENTIAL) . 
CYTOPLASMIC (POTENTIAL) . 

5 (POTENTIAL) - 
EXTRACELLULAR (POTENTIAL) . 

6 (POTENTIAL) . 
CYTOPLASMIC (POTENTIAL) . 
N-LINKED (GLCNAC. . .) (POTENTIAL) 
Missing (in isoform 2) . 
/FTId=VSP_000052. 

D -> H. 

/ FTId=VAR_0 12 2 5 0 . 
Y -> C. 

/FTId=VAR_012251. 

R -> H (in sitosterolemia) . 

/FTId=VAR__012252. 

P -> T (in sitosterolemia) . 

/FTId=VAR_012253. 

E -> K. 

/FTId=VAR_012254 . 
A -> V. 

/FTId=VAR_012255. 

R -> Q (in sitosterolemia) . 

/ FT Id=VAR_0 12 2 5 6 . 

T -> K. 

/FTId=VAR_012257. 

R -> H (in sitosterolemia) . 

/FTId=VAR_012258. 

L -> P (in sitosterolemia) . 

/FTId=VAR_012259. 

R -> S (in sitosterolemia) . 

/FTId=VAR_012260. 

Missing (in sitosterolemia) . 

/FTId=VAR_0122 61. 

L -> P (in sitosterolemia) . 

/FTId=VAR_0122 62. 

G -> E (in sitosterolemia) . 

/FTId=VAR_0122 63. 

G -> R (in sitosterolemia) . 

/FTId=VAR_0122 64. 

G -> R. 

/FTId=VAR 012265. 



FT VARIANT 596 596 L -> R (in sitosterolemia) . 

FT /FTId=VARJD 12 2 66. 

FT VARIANT 632 632 V -> A. 

FT /FTId=VAR_012267 . 

SQ SEQUENCE 673 AA; 75678 MW; 594AFD1D6C1BB50F CRC64; 

Query Match 82.4%; Score 2879,5; DB 1; Length 673; 

Best Local Similarity 81.7%; Bred. No. 3e-206; 

Matches 550; Conservative 52; Mismatches 70; Indels 1; Gaps 1; 

Qy 1 MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIAS 60 

III II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I 
Db 1 MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLAS 60 

Qy 61 QVPWFEQLAQFKI PWRSHSSQDSCELGI RNLS FKVRSGQMLAI I GS SGCGRASLLDVITG 120 

I I I I I I I I I I I I : I I I I I : I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 QVPWFEQLAQFKMPWTSPSCQNSCELGIQNLS FKVRSGQMLAI I GS SGCGRASLLDVITG 12 0 

Qy 121 RGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFS 180 

I I I I M : I I I I I I I I I I I I : I II I I I I I I I I I I I : I I M I I I I I I I I I I I I I I I I I I I I I 
Db 121 RGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFS 180 

Qy 181 QAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPT 240 

I I I I I I II I I I I I I I I I I I I I : I II I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 QAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPT 240 

Qy 241 SGLDSFTAHNLWTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQM 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I 

Db 241 S GL D S FT AHN L VKT L S RLAKGNRL VL ISLHQPRSDI FRL FD L VL LMT S GT P I YLGAAQHM 300 

Qy 301 VQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFL 360 

II I I I : I I : I I I I I I I I II I I I I I I I I I I I I : I : I : I I I I I I I I I I I I I I I I : I I I I 

Db 301 VQYFTAIGYPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDFL 360 

Qy 361 WKAEAKELNTSTHTVSLTLTQDTDC-GTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIH 419 

I I I I I : I : I I I I : I : : : I I : : I I : I II I I I I I I I I I I I I I I I I I 

Db 361 WKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIH 420 

Qy 420 GSEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYY 479 

I : I I I I I I : llllhlll: I I II I I I II I M I I I I I I I I I I I I I I : I I I : I I I : I I I I 
Db 421 GAEACLMSMTIGFLYFGHGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYSERAMLYY 480 

Qy 4 80 ELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLWF 539 

I I I I I I II I I I I I I I I I I I I I I I I I I : I I I II III I I I I : I I I I I I I I I I I I I 
Db 481 ELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLWF 540 

Qy 540 CCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCF 599 

III I II I I : I : I I I I I I : I I I I I I I I I I I I I I I I I I : I I I I I I I I I : I I I I I I I 

Db 541 CCRIMALAAAALLPTFHMAS FFSNALYNSFYLAGGFMINLS SLWTVPAWI SKVS FLRWCF 600 

Qy 600 SGLMQIQFNGHLYTTQIGNFTFSILGDTMI SAMDLNSHPLYAIYLIVIGISYGFLFLYYL 659 

|||:|||: I :M I :: II ::| I : I : I : I I I I I II I I I I : I lh III: 
Db 601 EGLMKIQFSRRTYKMPLGNLTIAVSGDKILSVMELDSYPLYAIYLIVIGLSGGFMVLYYV 660 

Qy 660 SLKLIKQKSIQDW 672 

11:1111 III 
Db 661 SLRFIKQKPSQDW 673 



RESULT 4 
ABG5 RAT 



ID ABG5_RAT STANDARD; PRT; 652 AA. 

AC Q99PE7; Q8CIQ4; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 10-OCT-2003 (Rel. 42, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 5 (Sterolin-1) . 

GN ABCG5 . 

OS Rattus norvegicus (Rat) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Rattus. 

OX NCBI_TaxID=10116; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Sprague-Dawley; TISSUE=Small intestine; 

RX MEDLINE=20578753; PubMed=l 1138 003 ; 

RA Lee M.-H., Lu K. , Hazard S., Yu H., Shulenin S., Hidaka H., Kojima H., 

RA Allikmets R. , Sakuma N., Pegoraro R. , Srivastava A.K., Salen G., 

RA Dean M. , Patel S.B.; 

RT "Identification of a gene, ABCG5, important in the regulation of 

RT dietary cholesterol absorption."; 

RL Nat. Genet. 27:79-83(2001). 

RN [2] 

RP REVISION TO 2. 

RA Lu K. , Lee M.-H., Patel S.B.; 

RL Submitted (AUG-2002) to the EMBL/ GenBank/DDBJ databases. 

RN [3] 

RP SEQUENCE FROM N.A., TISSUE SPECIFICITY, AND VARIANT CYS-583. 

RC STRAIN=GH, SHR, SHRSP, Sprague-Dawley, Wistar, Wistar Kyoto, and WKA; 

RX PubMed= 12783625; 

RA Yu H., Pandit B., Klett E. , Lee M.H., Lu K., Helou K. , Ikeda I., 

RA Egashira N., Sato M. , Klein R. , Batta A., Salen G. , Patel S.B.; 

RT "The rat STSL locus: characterization, chromosomal assignment, and 

RT genetic variations in sitosterolemic hypertensive rats."; 

RL BMC Cardio vase. Disord. 3:4-4(2003). 

CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG8 or be tightly coupled to 
CC ABCG8 along a pathway regulating diatery-sterol absorption and 

CC excretion (By similarity) . 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- TISSUE SPECIFICITY: Expressed only in liver and intestine. 

CC -!- POLYMORPHISM: The polymorphism at position 583 is found in strains 

CC SHR, SHRSP and Wistar Kyoto which are both hypertensive and 

CC sitosterolemic. Strains which are hypertensive but not 

CC sitosterolemic do not contain a polymorphism at this position. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 

CC subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 



CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; AF312714; AAG53098.3; 

DR EMBL; AY145899; AAN64275.1; -. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; 1. 

DR PROSITE; PS50893; ABC_TRANSP0RTER_2 ; 1. 

KW ATP-binding; Glycoprotein; Transmembrane; Transport; Polymorphism. 
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485 


504 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


505 


525 
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FT 


DOMAIN 


526 


529 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


530 


550 


5 (POTENTIAL). 


FT 


DOMAIN 


551 


624 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


62 5 


645 


6 (POTENTIAL) . 


FT 


DOMAIN 


646 


652 


CYTOPLASMIC (POTENTIAL) . 


FT 


NP BIND 


87 


94 


ATP (POTENTIAL) . 


FT 


CARBOHYD 


585 


585 


N-LINKED (GLCNAC. . .) (POTENTIAL). 


FT 


CARBOHYD 


592 


592 


N-LINKED (GLCNAC. . .) (POTENTIAL) . 


FT 


VARIANT 


583 


583 


G -> C (in strains SHR, SHRSP and Wis tar 


FT 








Kyoto) . 


SQ 


SEQUENCE 


652 AA; 


73372 


MW; 49FEF7372269299D CRC64; 



Query Match 20.3%; Score 710; DB 1; Length 652; 

Best Local Similarity 30.2%; Pred. No. 4.6e-45; 

Matches 190; Conservative 129; Mismatches 258; Indels 52; Gaps 15; 

Qy 18 QDASGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIASQV-PWFEQLAQFKIPWR 76 

: I I :: I I II : : I I : : : : I : : : I I I I 
Db 10 EGARGPHNNRGSQSSLEEGSVTGSEARHSLGVLNVSFSV— SNRVGPW WN 57 

Qy 77 SHSSQDSCELGI-RNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWIN 135 

II : I : : : I : I I I : I : I I I I I : : I I I I : I I I : : : : I 

Db 58 I KSCQQKWDRKI LKDVSLYI ES GQTMCI LGS SGS GKTTLLDAI S GRLRRTGTLEGEVFVN 117 

Qy 136 GQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAEL 195 

I : | | : : : | | I : I I I I I II : I : I I : I II : I I I : I I 

Db 118 GCELRRDQFQDCVSYLLQSDVFLSSLTVRETLRYTAMLAL-RSSSADFYDKKVEAVLTEL 17 6 

Qy 196 RLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTL 255 

I I : : I I I : I I I I I I I I I I I I : I : : : I I I I I : I I I I I : : : I I 

Db 177 SLSHVADQMIGNYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANHIVLLL 236 

Qy 2 56 SRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYS 315 

I | : I I : I : : : : I I I I I : : I I I : : : I I : : I : : I : : I : I : I I I : I 



Db 



237 VELARRNRIVIWIHQPRSELFHHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHS 2 96 



Qy 316 NPADFWDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTV 375 

I I I I I : I I I I : I : I : I I I : I : : I I : I : I I : : : I 

Db 297 NPFDFYMDLTSVDTQSREREIETYKRVQMLESAFRQ SDICHKI-LENIERTRHLK 350 

Qy 376 SLTL TQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIG 431 

: I : I : : III: I : I I I I : : : : : : I I : 

Db 351 TLPMVPFKTKNP PGMFCKLGVLLRRVTRNLMRNKQWIMRLVQNLIMGLFLI 4 02 

Qy 432 F— LYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAG 489 

I I : : : I I I : : I : : I : I : I :: I : I I I I 

Db 403 FYLLRVQNNMLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYQKW 4 62 

Qy 490 PYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELF LL — HFLLVWLWFCC 541 

1:1 El :|:: II II I II I : :| 
Db 463 QMLLAYVLHALPFSIVATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFL 517 

Qy 542 RTMALAASAMLPTFHMS S FFCNAL YNS FYLTAGFMINLDNLWI VPAWI S KLS FLRWCFS G 601 

I : I I : : I : : : I I : I : : : I : : I : : I 

Db 518 -TLVLLGMVQNPNI-VNSIVALLSISGLLIGSGFIRNIEEMPIPLKILGYFTFQKYCCEI 575 

Qy 602 LMQIQFNGHLYTTQIGNFTFSILGDTMIS 630 

I : : I I : I I I : : I I 

Db 576 LWNEFYGLNFT— CGGSNTSVPNNPMCS 602 



RESULT 5 
ABG5_MOUSE 

ID ABG5_MOUSE STANDARD; PRT; 652 AA. 

AC Q99PE8; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel, 41, Last sequence update) 

DT 28-FEB-2003 (Rel. 41, Last annotation update) 

DE ATP-binding cassette, sub- family G, member 5 (Sterolin-1) . 

GN ABCG5 . 

OS Mus mus cuius (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=C57BL/6; TISSUE=Liver ; 

RX MEDLINE=20578753; PubMed=11138003 ; 

RA Lee M.-H., Lu K. , Hazard S., Yu H . , Shulenin S., Hidaka H., Kojima H., 

RA Allikmets R. , Sakuma N., Pegoraro R., Srivastava A.K., Salen G., 

RA Dean M. , Patel S.B.; 

RT "Identification of a gene, ABCG5, important in the regulation of 

RT dietary cholesterol absorption."; 

RL Nat. Genet. 27:79-83(2001). 

RN [2] 

RP TISSUE SPECIFICITY, AND INDUCTION. 

RX MEDLINE=20553648; PubMed=110994 17 ; 

RA Berge K.E., Tian H., Graf G.A., Yu L., Grishin N.V., Schultz J., 

RA Kwiterovich P., Shan B., Barnes R., Hobbs H.H.; 

RT "Accumulation of dietary cholesterol in sitosterolemia caused by 

RT mutations in adjacent ABC transporters."; 



RL Science 290:1771-1775(2000) . 

CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG8 or be tightly coupled to 
CC ABCG8 along a pathway regulating diatery-sterol absorption and 

CC excretion (By similarity) . 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- TISSUE SPECIFICITY: Expressed in the intestine and, at lower 
CC level, in the liver. 

CC -!- INDUCTION: Upregulated by cholesterol feeding. Possibly mediated 
CC by the liver X receptor/retinoic X receptor (LXR/RXR) pathway. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC — 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib. ch) . 

CC 

DR EMBL; AF312713; AAG53097.1; -. 

DR MGD; MGI: 1351659; AbcgS. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC__transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC_TRANSPORTER_l ; 1. 

DR PROSITE; PS508 93; ABC_TRANSPORTER_2 ; 1. 

KW ATP-binding; Glycoprotein; Transmembrane; Transport. 

FT DOMAIN 1 385 CYTOPLASMIC (POTENTIAL) . 

FT TRANSMEM 386 406 1 (POTENTIAL). 

FT DOMAIN 407 422 EXTRACELLULAR (POTENTIAL) . 

FT TRANSMEM .423 443 2 (POTENTIAL). 

FT DOMAIN 444 463 CYTOPLASMIC (POTENTIAL) . 

FT TRANSMEM 464 484 3 (POTENTIAL). 

FT DOMAIN 485 504 EXTRACELLULAR (POTENTIAL) . 

FT TRANSMEM 505 525 4 (POTENTIAL). 

FT DOMAIN 526 529 CYTOPLASMIC (POTENTIAL) . 

FT TRANSMEM 530 550 5 (POTENTIAL). 

FT DOMAIN 551 622 EXTRACELLULAR ( POTENTIAL) . 

FT TRANSMEM 623 643 6 (POTENTIAL). 

FT DOMAIN 644 652 CYTOPLASMIC (POTENTIAL). 

FT NP_BIND 87 94 ATP (POTENTIAL) . 

FT CARBOHYD 410 410 N-LINKED (GLCNAC. . .) (POTENTIAL). 

FT CARBOHYD 585 585 N-LINKED (GLCNAC. . .) (POTENTIAL). 

FT CARBOHYD 592 592 N-LINKED (GLCNAC. . .) (POTENTIAL). 

SQ SEQUENCE 652 AA; 73244 MW; 8 0CE37ADCC19771E CRC64; 



Query Match 20.1%; Score 702.5; DB 1; 

Best Local Similarity 29.4%; Pred. No. 1.7e-44; 
Matches 195; Conservative 127; Mismatches 252; 



Length 652; 



Indels 89; Gaps 



18; 



Qy 24 QDSLFSSESDNS LYFTYSGQSNTLEVRDLTYQVDIASQV-PWFEQLAQFKIPWRSHS 79 

I I : : I : : I I : : I I : : : : I I I I I 

Db 27 QGSVTGTEARHSLGVLHVSYS VSNRVGPW WNIKS 60 

Qy 80 SQDSCELGI-RNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQP 138 

I : I : : : I : I I I : : I : I I I I I : : I I I I : I I I : : : : I I 

Db 61 CQQKWDRQI LKDVS L YI ES GQIMCI LGS SGS GKTTLLDAI S GRLRRTGTLEGEVFVNGCE 120 

Qy 139 STPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLR 198 

: I : : I I I I : I M I I I I : I : I I : I : I : I I I : I i I 
Db 121 LRRDQFQDCFSYVLQSDVFLSSLTWETLRYTAMIALCRS-SADFYNKKVEAVMTELSLS 179 

Qy 199 QCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRL 258 

I : : I : I : I I I I I I I I I I I I : I : : : I I I I I : I I I I I : : I I : I 

Db 180 HVADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVl^LDEPTTGLDCMTANQIVLLLAEL 239 

Qy 259 AKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPA 318 

I : : I : I : : : : I I I I I : : I : I I : : : I I : : I : : I : : I : I : I I I : I I I 
Db 240 ARRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPF 299 

Qy 319 DFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLT 378 

I I I : I I I I : I : I : I I I : I : : I I II : I I : II 

Db 300 DFYMDLTSVDTQSREREIETYKRVQMLECAFKES-DIYHKILENIERARYLKTLPTVPFK 358 

Qy 379 LTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLIIGF--LYYG 436 

I : I III: 1:11 I I : : : : : : I I : I I 

Db 359 -TKDP P GMFGKL G VL L RRVT RN LMRN KQAVI MRL VQN L I MGL FL I F YL L RVQ 409 

Qy 437 HG7VKQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTAGPYFFAKI 496 

: : : I I I : : I : : I : I : I : : I : I I I I I : 

Db 410 NN T L KGAVQD RVGLL YQ LVGAT P YT GMLNAVN L F PML RAVS DQ E S Q DGL YHKWQML LAYV 469 

Qy 4 97 LGELPEHCAYVIIYAMPIYWLTNLRPVPELF LL — HFLLVWLWFCCRTMALAA 548 

III : I : : II II I I I I : : I hi 
Db 470 LHVLPFSVIATVIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFL TLVLLG 523 

Qy 549 SAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCFSGLMQIQFN 608 

I : : I : : : I I : I : : I : : I : : I I : : I 

Db 524 IVQNPNI-VNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFY 582 

Qy 609 GHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISY GFL 654 

I III : I : : | | : I I I : : II 

Db 583 GL NFTCGGSNTSML NHPMCA 1 TQ GVQ FI EKT C P GAT S RFTAN FL 62 6 

Qy 655 FLY 657 

I I 

Db 627 ILY 629 



RESULT 6 
ABG5_HUMAN 

ID ABG5_HUMAN STANDARD; PRT; 651 AA. 

AC Q9H222; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 5 (Sterolin-1) . 



GN ABCG5 . 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID=9606; 

RN [1] 

RP SEQUENCE FROM N.A., AND VARIANT GLU-604. 

RC TISSUE=Liver; 

RX MEDLINE=20553648; PubMed=11099417 ; 

RA Berge K.E., Tian H., Graf G.A., Yu L., Grishin N.V., Schultz J., 

RA Kwiterovich P., Shan B., Barnes R. , Hobbs H.H.; 

RT "Accumulation of dietary cholesterol in sitos terolemia caused by 

RT mutations in adjacent ABC transporters."; 

RL Science 290:1771-1775(2000). 

RN [2] 

RP SEQUENCE FROM N.A. , VARIANTS SITOS TEROLEMIA HIS-389; HIS-419 AND 

RP PRO-419, AND VARIANT GLU-604. 

RC TISSUE=Liver; 

RX MEDLINE-20578753; PubMed=11138003 ; 

RA Lee M.-H., Lu K., Hazard S., Yu H,, Shulenin S., Hidaka H . , Kojima H., 

RA Allikmets R. , Sakuma N., Pegoraro R. , Srivastava A.K., Salen G. , 

RA Dean M. , Patel S.B.; 

RT "Identification of a gene, ABCG5, important in the regulation of 

RT dietary cholesterol absorption."; 

RL Nat. Genet. 27:79-83(2001). 

RN [3] 

RP REVIEW. 

RX MEDLINE=21474438; PubMed-115902 07 ; 

RA Schmitz G., Langmann T . , Heimerl S.; 

RT "Role of ABCG1 and other ABCG family members in lipid metabolism."; 

RL J. Lipid Res. 42:1513-1520(2001). 

RN [4] 

RP VARIANTS SITOSTEROLEMIA GLN-146; HIS-389; PRO-419; HIS-419 AND 

RP SER-550, AND VARIANT GLU-604. 

RX MEDLINE=21344600; PubMed=11452359 ; 

RA Lu K., Lee M.-H., Hazard S., Brooks-Wilson A., Hidaka H., Kojima H., 

RA Ose L., Stalenhoef A.F.H., Mietinnen T., Bjorkhem I., Bruckert E., 

RA Pandya A., Brewer H.B. Jr., Salen G. , Dean M. , Srivastava A.K., 

RA Patel S.B.; 

RT "Two genes that map to the STSL locus cause sitosterolemia : genomic 

RT structure and spectrum of mutations involving sterolin-1 and 

RT sterolin-2, encoded by ABCG5 and ABCG8 , respectively."; 

RL Am. J. Hum. Genet. 69:278-290(2001). 

CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG8 or be tightly coupled to 
CC ABCG8 along a pathway regulating diatery-sterol absorption and 

CC excretion. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- TISSUE SPECIFICITY: Strongly expressed in the liver, lower levels 

CC in the small intestine and colon. . 

CC -!- DISEASE: Defects in ABCG5 are a cause of sitosterolemia 

CC [MIM: 210250] ; also known as phytosterolemia or shellfish 

CC sterolemia. It is a rare autosomal recessive disorder 

CC characterized by increased intestinal absorption of all sterols 



CC including cholesterol, plant and shellfish sterols , and decreased 

CC biliary excretion of dietary sterols into bile. Sitosterolemia 

CC patients have hypercholesterolemia, very high levels of plant 

CC sterols in the plasma, and frequently develop tendon and tuberous 

CC xanthomas, accelerated atherosclerosis and premature coronary 

CC artery disease. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; AF320293; AAG40003.1; -. 

DR EMBL; AF312715; AAG53099.1; -. 

DR Genew; HGNC: 13886; ABCG5 . 

DR MIM; 605459; -. 

DR MIM; 210250; 

DR GO; GO: 0030299; P : cholesterol absorption; NAS . 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC_TRANSPORTER_l ; FALSE_NEG. 

DR PROSITE; PS50893; AB C_T RAN S PORT ER_2 ; 1. 

KW ATP-binding; Glycoprotein; Transmembrane; Transport; Polymorphism; 

KW Disease mutation. 



FT 


DOMAIN 


1 


383 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANS MEM 


384 . 


404 


1 (POTENTIAL) . 
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421 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANS MEM 


422 


442 


2 (POTENTIAL) . 


FT 


DOMAIN 


443 


462 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


463 


483 


3 (POTENTIAL) . 


FT 


DOMAIN 


484 


503 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


504 


524 


4 (POTENTIAL) . 


FT 


DOMAIN 


525 


528 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


529 


549 


5 ( POTENTIAL) . 


FT 


DOMAIN 


550 


623 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


624 


644 


6 (POTENTIAL) . 


FT 


DOMAIN 


645 


651 


CYTOPLASMIC (POTENTIAL) . 


FT 


NP_BIND 


86 


93 


ATP (POTENTIAL) . 


FT 


CARBOHYD 


584 


584 


N-LINKED (GLCNAC. . .) (POTENTIAL) 


FT 


CARBOHYD 


591 


591 


N-LINKED (GLCNAC. . .) (POTENTIAL) 


FT 


VARIANT 


146 


146 


E -> Q (in sitosterolemia) . 


FT 








/FTId=VAR_012244 . 


FT 


VARIANT 


389 


389 


R -> H (in sitosterolemia) . 


FT 








/FTId=VAR_012245. 


FT 


VARIANT 


419 


419 


R -> H (in sitosterolemia) . 


FT 








/FTId=VAR_0 12246. 


FT 


VARIANT 


419 


419 


R -> P (in sitosterolemia) . 


FT 








/FTId=VAR__012247. 


FT 


VARIANT 


550 


550 


R -> S (in sitosterolemia) . 



FT /FTId=VAR_012248 . 

FT VARIANT 604 604 Q -> E. 

FT /FTId=VAR_012249. 

SQ SEQUENCE 651 AA; 72503 MW; 950BABFCBB6A1536 CRC64; 

Query Match 19.9%; Score 697; DB 1; Length 651; 

Best Local Similarity 29.1%; Pred. No. 4.3e-44; 

Matches 195; Conservative 12 9; Mismatches 2 63; Indels 84; Gaps 18; 

Qy 17 LQDASGLQDSL FSSESDNSLYFTYSGQSNTLEVRDLTYQVDIASQVPWFEQLAQFK 72 

II I I II :: :M :: I : II ||:: : : 

Db 15 LQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVR PWWD-ITSCR 61 

Qy 73 IPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKMKSGQ 131 

I : : : : I I I I I : : I : I I I I I : : I I I : : I I I I I : 

Db 62 QQWTRQI LKDVS LYVESGQIMCI LGS S GSGKTTLLDAMS GRLGRAGT F- LGE 112 

Qy 132 IWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDV 191 

: : : I I : : : I : : I I I I I : I I I I I I I : I : : I : hill 
Db 113 VYWGRALRREQFQDCFSYXn^QSDTLLSSLTVT^ETLHYTALLAI-RRGNPGSFQKKVEAV 171 

Qy 192 IAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNL 251 

: I I I I I : : I I : I : I I I II I I I I I I I : I : : : I I I I : I I I I I : : 
Db 172 MAELSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQI 231 

Qy 252 VTTLS RLAKGNRLVLI SLHQPRS DI FRLFDLVLLMT S GT P I YLGAAQQMVQYFT S I GHPC 311 

I I | | : M : | : : : : | | I I I : : I : I I I : : :: I hi : I : : I I : I I 
Db 232 WLLVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPC 291 

Qy 312 PRYSNPADFYVT)LTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTS 371 

I : I I I I I I : I I I I : I : I I I I I : I : : I : : : : ■ I : 

Db 292 PEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSA -ICHKTLKNIERM 345 

Qy 372 THTVSLTL TQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMS 427 

I : I : I : I : II : : I : I I II: :: : : I 

Db 346 KHLKTLPMVPFKTKDS P GVFS KL GVLL RRVT RNLVRN KLAVI T RL LQN L I MG 397 

Qy 428 LIIGFLYYGHGAKQL— SFMDTAALLEMIGALIPFNVILDWSKCHSERSMLYYELEDGL 485 

I : I : I : I I I : I : : I : I : I :: I : I I I 

Db 398 LFLLFFVLRWSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGL 457 

Qy 486 YTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELF LL — HFLLVWLV 537 

I I I I I : I :: II II I I I I : : I 

Db 458 YQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFL- 516 

Qy 538 VFCCRTMALAASAMLPTFHMS S FFCNALYN S FYLTAGFMI NLDNLWI VPAWI S KLS FLRW 597 

I : I I : : I : : I I : I : : I I I : I : : 

Db 517 TLVLLGIVQNPNI-VNSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFTFQKY 570 

Qy 598 CFSGLMQIQFNGHLYTTQIGNFTFSILGDTM ISAMDLNSHPLY 640 

I I : : I I : I I : I : : I I : I I I 

Db 571 CSEILWNEFYGLNFT — CGSSNVS VTTNPMCAFTQGIQFIEKTCPGATSRFTMNFLILY 628 

Qy 641 AIY — LIVIGI 649 

: |:::| I 
Db 629 SFIPALVILGI 639 



RESULT 7 
ABG2_HUMAN 

ID ABG2_HUMAN STANDARD; PRT; 655 AA. 

AC Q9UNQ0; 095374; Q9BY73; Q9NUS0; 

DT 16-OCT-2001 (Rel. 40, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 28-FEB-2003 (Rel. 41, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 2 (Placenta-specific ATP 

DE binding cassette transporter) (Breast cancer resistance protein) . 

GN ABCG2 OR ABCP OR BCRP OR BCRP1. 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxI D=9 6 0 6 ; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Placenta; 

RX MEDLINE=99065313; PubMed=9850061 ; 

RA Allikmets R. , Schriml L.M., Hutchinson A., Romano-Spica V., Dean M. 

RT "A human placenta-specific ATP-binding cassette gene (ABCP) on 

RT chromosome 4q22 that is involved in multidrug resistance."; 

RL Cancer Res. 58:5337-5339(1998). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Breast cancer; 

RX MEDLINE=99080071; PubMed=98 61027 ; 

RA Doyle L.A., Yang W. , Abruzzo L.V., Krogmann T., Gao Y., Rishi A.K., 

RA Ross D. D. ; 

RT "A multidrug resistance transporter from human MCF-7 breast cancer 

RT cells."; 

RL Proc. Natl. Acad. Sci. U.S.A. 95:15665-15670(1998). 

RN [3] 

RP ERRATUM. 

RA Doyle L.A., Yang W., Abruzzo L.V. , Krogmann T., Gao Y. , Rishi A.K., 

RA Ross D. D. ; 

RL Proc. Natl. Acad. Sci. U.S.A. 96:2569-2569(1999). 

RN [4] 

RP SEQUENCE FROM N.A. 

RA Kage K. , Tsukahara S., Sugiyama T . , Asada S., Ishikawa E., Tsuruo T 

RA Sugimoto Y. ; 

RT "Breast cancer resistance protein constitutes a 140-kDa complex as 

RT homodimer . " ; 

RL Submitted (MAR-2001) to the EMBL/GenBank/ DDB J databases. 

RN [5] 

RP SEQUENCE OF 198-655 FROM N.A. 

RC TISSUE-Placenta; 

RA Isogai T., Ota T., Hayashi K., Sugiyama T., Otsuki T., Suzuki Y., 

RA Nishikawa T., Nagai K., Sugano S., Shiratori A. , Sudo H., 

RA Wagatsuma M. , Hosoiri T., Kaku Y. , Kodaira H., Kondo H., Sugawara M 

RA Takahashi M. , Chiba Y., Ishida S., Murakawa K., Ono Y., Takiguchi S 

RA Watanabe S., Kimura K. f Murakami K., Ishii S., Kawai Y. , Saito K. , 

RA Yamamoto J., Wakamatsu A., Nakamura Y., Nagahari K., Masuho Y., 

RA Ninomiya K., Iwayanagi T.; 

RT "NEDO human cDNA sequencing project.' 1 ; 

RL Submitted (FEB-2000) to the EMBL/ GenBank/DDBJ databases. 
RN [6] 



RP 
RX 
RA 
RT 
RL 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
CC 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
KW 



REVIEW. 

MEDLINE=2 147 4438; PubMed=11590207 ; 
Schmitz G., Langmann T., Heimerl S.; 

"Role of ABCG1 and other ABCG family members in lipid metabolism."; 
J. Lipid Res. 42:1513-1520(2001). 

-!- FUNCTION: Xenobiotic transporter that appears to play a major role 
in the multidrug resistance phenotype of a specific MCF-7 breast 
cancer cell line. When overexpressed, the transfected cells become 
resistant to mitoxantrone, daunorubicin and doxorubicin, display 
diminished intracellular accumulation of daunorubicin, and 
manifest an ATP-dependent increase in the efflux of rhodamine 123. 
SUBCELLULAR LOCATION: Integral membrane protein (Probable) . 
SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
subfamily. 

This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 
the European Bioinf ormatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib . ch) . 
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EMBL; AF103796; AAD09188 
EMBL; AF098951; AAC97367 
EMBL; AB056867; BAB39212 
EMBL; AK002040; BAA92050 
Genew; HGNC:74; ABCG2 . 
MIM; 603756; -. 
GO; GO: 0016021; 
GO: 0005524; 
GO: 0004009; 
GO: 0005215; 
GO: 0008559; 



TAS. 



GO; 
GO; 
GO; 
GO; 
GO; 
GO; 



transporter acti. 



FALSE_NEG. 
1. 



C: integral to membrane; 
F : ATP binding; TAS. 
F: ATP-binding cassette (ABC) 
F: transporter activity; TAS. 

F:xenobiotic-transporting ATPase activity; TAS. 
GO:0009315; P:drug resistance; TAS. 
GO: 0006810; P: transport; TAS. 
InterPro; IPR003593; AAA_ATPase. 
InterPro; IPR003439; ABC_transporter . 
Pfam; PF00005; 1 ABC_tran; 1. 
ProDom; PD000006; ABC__transporter ; 1. 
SMART ; SM00382; AAA; 1. 
PROSITE; PS00211; ABC_TRANSPORTER_l ; 
PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 
ATP-binding ; Transmembrane ; Transport . 

CYTOPLASMIC (POTENTIAL) . 
POTENTIAL. 

EXTRACELLULAR (POTENTIAL) . 
POTENTIAL . 

CYTOPLASMIC (POTENTIAL) . 
POTENTIAL. 

EXTRACELLULAR (POTENTIAL) . 
POTENTIAL. 

CYTOPLASMIC (POTENTIAL) . 
POTENTIAL. 

EXTRACELLULAR (POTENTIAL) . 
POTENTIAL. 

CYTOPLASMIC (POTENTIAL) . 



.; TAS. 
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DOMAIN 


1 


395 


FT 


TRANSMEM 


396 


416 


FT 


DOMAIN 


417 


428 


FT 


TRANSMEM 


429 


449 


FT 


DOMAIN 


450 


477 


FT 


TRANSMEM 


478 


498 


FT 


DOMAIN 


499 


506 


FT 


TRANSMEM 


507 


527 


FT 


DOMAIN 


528 


535 


FT 


TRANSMEM 


536 


556 


FT 


DOMAIN 


557 


630 


FT 


TRANSMEM 


631 


651 


FT 


DOMAIN 


652 


655 



FT 


NP BIND 


80 


87 


ATP (POTENTIAL) . 




FT 


CARROHYD 


418 


418 


N-LINKED (GLCNAC. . .) 


( POTENTIAL) 


FT 


CARBOHYD 


557 


557 


N-LINKED (GLCNAC. . .) 


( POTENTIAL) 


FT 


CARROHYD 


596 


596 


N-LINKED ( GLCNAC ) 


( POTENTIAL} 

\ XT v^s J- XUXi X. -t I * / 


FT 


CONFLICT 


24 


24 


V -> A (IN REF. 2 AND 


4) . 


FT 


CONFLICT 


166 


166 


E -> Q (IN REF. 2 AND 


4) . 


FT 


CONFLICT 


208 


208 


F -> S (IN REF. 1) . 




FT 


CONFLICT 


315 


316 


MISSING (IN REF. 5) . 




FT 


CONFLICT 


482 


482 


R -> T (IN REF. 2) . 




SQ 


SEQUENCE 


655 AA; 


72343 


MW; 89A6D3511DC5CCE0 CRC64; 



Query Match 18.8%; Score 657; DB 1; Length 655; 

Best Local Similarity 26.8%; Pred. No. 4e-41; 

Matches 186; Conservative 138; Mismatches 271; Indels 100; Gaps 20; 

Qy 32 SDNSLYFTYSGQSNT LEVRDLTYQVDIASQVPWFEQLAQFK 72 

III III I : : I : I : I 
Db 3 SSNVEVFIPVSQGNTNGFPATVSNDLKAFTEGAVLSFHNICYRVKLKSGF 52 

Qy 73 IPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQI 132 

: I I : : : I : : : : I : I I : I : I I : : I I I I I : I : I I : 

Db 53 LPCRKPVEKEI LSNINGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSGL-SGDV 106 

Qy 133 WINGQPSTPQLVRKC-VAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDV 191 

I II I II : | I I : : I I I I I I I I : I I I : : : : : I : I 

Db 107 LINGAPRPANF — KCNSGYWQDDWMGTLTVRENLQFSAALRLATTMTNHEKNERINRV 164 

Qy 192 IAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNL 251 

I II I : I : : : I I : : I I I I I I I I : I I I I : : I : : I II I I I I I : I I I I II: : 
Db 165 IEELGLDKVADSKVGTQFIRGVSGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAV 224 

Qy 252 WTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPC 311 

: I I : : I I : : I : I I I I I I : I I I : I : I I : : I I I : : II I I : I 
Db 225 LLLLKRMSKQGRTI I FS IHQPRYS I FKLFDSLTLLAS GRLMFHGPAQEALGYFESAGYHC 284 

Qy 312 P R Y S N PAD FYVD LT S I D RRS KEREVATVEK AQSLAALFLEKVQGFD 357 

I : II I I I : : I : : I : I I : I : I I : : : 
Db 285 EAYNNPADFFLDIINGDSTAVALNREEDFKATEIIEPSKQDKPLIEKLAEIYVN — S 339 

Qy 358 DFL — WKAEAKELNTSTHTVS LTLTQDTDCGTAVELPGMI EQFSTLI RRQI SNDFRDLPT 415 

I I I I : I : : | : : : | : | : : | | : 

Db 340 SFYKETKAELHQLSGGEKKKKITVFKEISYTTS FCHQLRWVSKRSFKNLLGNPQA 394 

Qy 416 LLIHGSEACLMSLI I GFLYYGHGAKQLS FMDTAALLFMI GALI PFNVT LDWSKCHS 472 

: : : | : | | : | : | : | : | | : : : I I 

Db 395 SIAQIIVTWLGLVIGAIYFGLKNDSTGIQNRAGVLFFL TTNQCFSSVS 443 

Qy 473 ERSMLYYELEDGLYTAGPYFFAKILGE-LPEHCAYVIIYAMPIYWLTNLRPVP 524 

I : : : I II II I : I : I I I I : : I : : I : I 

Db 444 AVELFVVEKKLFIHEYISGYYRVSSYFLGKLLSDLLPMRMLPSIIFTCIVYFMLGLKPKA 503 

Qy 525 ELFLLHFLLVWLWFCCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNL— 582 

: I : : : I : : I I I I : I : : : : : : | : : I I : 

Db 504 DAFFVMMFTLMMVAYSASSMALAIAAGQSWSVATLLMTICFVFMMIFSGLLVNLTTIAS 563 

Qy 583 WIVPAWISKLSFLRWCFSGLMQIQFNGHLYTTQIG N FT FS I LGDTMI — SAMD 633 

I : : I : I I : I : I : I I : : | : : | : : : | 



Db 



564 WL--SWLQYFSIPRYGFTALQHNEFLGQNFCPGLNATGNNPCNYA-TCTGEEYLVKQGID 620 



Qy 634 LNSHPLYAIYLIVIGISYGFLFLYYLSLKLIKQKS 668 

I : I : : : : : I I : I I I : I : I 
Db 621 LSPWGLWKNHVALACMIVI FLTIAYLKLLFLKKYS 655 



RESULT 8 
YOH5_YEAST 

ID YOH5_YEAST STANDARD; PRT; 1294 AA. 

AC Q08234; Q08233; 

DT 01-NOV-1997 (Rel. 35, Created) 

DT 16-OCT-2001 (Rel. 40, Last sequence update) 

DT 16-OCT-2001 (Rel. 40, Last annotation update) 

DE Probable ATP-dependent transporter YOL074C/YOL075C . 

GN YOL074C/YOL075C. 

OS Saccharomyces cerevisiae (Baker's yeast). 

OC Eukaryota; Fungi; Ascomycota; Saccharomycotina; Saccharomycetes ; 

OC Saccharomycetales ; Saccharomycetaceae; Saccharomyces. 

OX NCBI_TaxID=4932; 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=97321807; PubMed=917 8509 ; 

RA Tzermia M. , Katsoulou C, Alexandraki D.; 

RT "Sequence analysis of a 33.2 kb segment from the left arm of yeast 

RT chromosome XV reveals eight known genes and ten new open reading 

RT frames including homologues of ABC transporters, inositol 

RT phosphatases and human expressed sequence tags."; 

RL Yeast 13:583-589(1997). 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Potential). 

CC -!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; Z74817; CAA99085.1; -. 

DR EMBL; Z74816; CAA99084.1; 

DR PIR; S77690; S77690. 

DR GermOnline; 143497; -. 

DR SGD; S0005435; YOL075C. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 2. 

DR ProDom; PD000006; ABC_transporter ; 2. 

DR SMART; SM00382; AAA; 2. 

DR PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; 2. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 2. 

KW Hypothetical protein; ATP-binding; Transmembrane; Glycoprotein; 

KW Transport; Repeat. 

FT TRANSMEM 376 396 POTENTIAL. 

FT TRANSMEM 496 516 POTENTIAL. 

FT TRANSMEM 531 551 POTENTIAL. 
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• ) ( r U 1 rjN 1 1 A_l_i ) 


FT 


CARBOHYD 
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349 


N-LINKED (GLCNAC. 


. .) (POTENTIAL) 


FT 


CARBOHYD 


371 


371 


N-LINKED (GLCNAC. 


. .) (POTENTIAL) 


FT 


CARBOHYD 


52 8 


528 


IN J_i _L In Iv Hi \J \ \J XJ\^- Ji\-~ . 


) f POTFNTT AT,} 

• * / \ IT \J ± 111 IN L JL T\±J } 


FT 


CARBOHYD 


983 


983 


N-LINKED (GLCNAC. 


. .) (POTENTIAL) 


FT 


CARBOHYD 


1062 


1062 


N-LINKED (GLCNAC. 


. .) (POTENTIAL) 


SQ 


SEQUENCE 


1294 


AA; 145157 


MW; C555500A45E92E 


ME CRC64; 


Query Match 




18.7%; 


Score 655; DB 1; 


Length 12 94; 



Best Local Similarity 28.1%; Pred. No. 1.4e-40; 

Matches 173; Conservative 115; Mismatches 272; Indels 56; Gaps 13; 

Qy 88 I RNLS FKVRSGQMLAI I GS SGCGRAS LLDVI TGRGHGGKMKS GQI 132 

: I : I I : : I : : I I I I : : I I : I : : II : I I 
Db 45 WTFSMDLPSGSVl^VMGGSGSGKTTLLNvXASKISGGLTHNGSIRYvljEDTGSEPNETE 104 

Qy 133 WINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKR- 187 

: : I I I : I : : I I I I M I I I I I I : : I : : : I I : 

Db 105 PKRAHLDGQ-DHPIQKHVIMAYLPQQDVLSPRLTCRETLKFAADLKL NSSERTKKL 159 

Qy 188 -VEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSF 246 

II : I II I : I I : I I I : I I : I I I I : I I : I I I I : : II I : I I I I I : I I I : : 
Db 160 MVEQLIEELGLKDCADTLVGDNSHRGLSGGEKRRLSIGTQMISNPSIMFLDEPTTGLDAY 219 

Qy 247 TAHNLWTLSRI^-GNRLVXISLHQPRSDIFRLFDLVXL^SGTPIYLGAAQQMVQYFT 305 

: I : : I I : I I I I : : I : I I I I I I I I I I : : : I : I : I I 

Db 220 SAFLVI KTLKKLAKEDGRTFIMSIHQPRSDILFLLDQVCILSKGNWYCDKMDNTIPYFE 279 

Qy 306 SIGHPCPRYSNPADFWDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAEA 365 

III: I : I I I I :: : I I : I : I I I : I I I : II : : I : I 
Db 280 SIGYHVPQLVNPADYFIDLSSVDSRSDKEEAATQSRLNSL IDHWHDY ER 328 

Qy 366 KELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACL 425 

I : I : II : I : I I I : I I I I : : I : 

Db 329 THLQLQAESYISNATEIQIQNMTTRLP-FWKQWVXTRRNFKLNFSDYWLISTFAEPLI 387 

Qy 426 MS LI I GFLYYGHGAKQLS FMDTAALLFMI GAL I P — FNVI LDWSKCHSERSMLYYELED 483 

: : | :: | I : : I :: :: I I : :: I : 

Db 388 IGTVCGWIYYKPDKSSIGGLRTTTACLYASTILQCYLYLLFDTYRLCEQDIALYDRERAE 447 

Qy 484 GLYTAGPYFFA-KILGELPEHCAWIIYAMPIYWLTNLRPVPELFLLHFLLVWLWFCCR 542 

II : I I I I : I :|: I : : I I I :|:| I 

Db 448 GSVTPLAFIVARKISLFLSDDFAMTMI FVSITYFMFGLEADARKFFYQFAWFLCQLSCS 507 

Qy 543 TMALAASAMLPTFHMS S FFCNALYNS FYLTAGFMINLDNLWI VPAWI SKLS FLRWCFSGL 602 

::: : |: I :| I : : II :| : : || ::| : I I 
Db 508 GLSMLSVAVS RDFS KAS LVGNMT FTVl^SMGCGFFWAKVMPVYVRWI KYI AFTWYS FGTL 567 



Qy 603 MQIQFNGHLYTTQ IGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISY GFL 654 

I I II : I I : I I : I : I : : I I : 

Db 568 MSSTFTNSYCTTDNLDECLGNQILEVYG FPRNWITVPAWLLCWSVGYFWGAI 621 

Qy 655 FLYYLSLKLIKQKSIQ 670 

II : : I :: 
Db 622 I LYLHKI DITLQNEVK 637 



RESULT 9 
ABG1_M0USE 

ID ABGl_MOUSE STANDARD; PRT; 666 AA. 

AC Q64343; 

DT 01-NOV-1997 (Rel. 35, Created) 

DT 01-NOV-1997 (Rel. 35, Last sequence update) 

DT 15-MAR-2004 (Rel. 43, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 1 (White protein homolog) 

DE (ATP-binding cassette transporter 8) . 

GN ABCG1 OR ABC8 OR WHT1 . 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE-97186700; PubMed=9034316; 

RA Croop J.M., Tiller G.E., Fletcher J. A., Lux M.L., Raab E., 

RA Goldenson D . , Son D., Arciniegas S., Wu R. ; 

RT "Isolation and characterization of a mammalian homolog of the 

RT Drosophila white gene."; 

RL Gene 185:77-85(1997) . 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN=DBA/2; 

RX MEDLINE=96359154; PubMed=8703120; 

RA Savary S., Denizot F., Luciani M.-F., Mattel M.-G., Chimini G. ; 

RT "Molecular cloning of a mammalian ABC transporter homologous to 

RT Drosophila white gene."; 

RL Mamm. Genome 7:673-676(1996). 

RN [3] 

RP SEQUENCE FROM N.A. 

RX MEDLI NE=2 1092576; PubMed-1 1162488; 

RA Lorkowski S., Rust S., Engel T. f Jung E., Tegelkamp K., Galinski E.A., 

RA Assmann G., Cullen P.; 

RT "Genomic sequence and structure of the human ABCG1 (ABC8) gene."; 

RL Biochem. Biophys . Res. Commun. 280:121-131(2001). 

RN [4] 

RP INDUCTION, AND PROBABLE FUNCTION. 

RX MEDLINE=20261604; PubMed=10799558 ; 

RA Venkateswaran A., Repa J.J., Lobaccaro J.-M.A., Bronson A., 

RA Mangelsdorf D.J., Edwards P. A.; 

RT "Human white/murine ABC8 mRNA levels are highly induced in 

RT lipid-loaded macrophages. A transcriptional role for specific 

RT oxysterols. "; 

RL J. Biol. Chem. 275:14700-14707(2000). 

RN [5] 



RP REVIEW. 

RX MEDLINE=21474438; PubMed=11590207 ; 
RA Schmitz G. , Langmann T., Heimerl S.; 

RT "Role of ABCG1 and other ABCG family members in lipid metabolism."; 
RL J. Lipid Res. 42:1513-1520(2001). 

CC -!- FUNCTION: Transporter involved in macrophage lipid homeostasis. Is 
CC an active component of the macrophage lipid export complex. Could 

CC also be involved in intracellular lipid transport processes. The 

CC role in cellular lipid homeostasis may not be limited to 

CC macrophages. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 
CC -!- TISSUE SPECIFICITY: Expressed mainly in brain, thymus, lung, 
CC adrenals, spleen and placenta. Little or no expression in liver, 

CC kidney, heart, muscle or testes. 

CC -!- INDUCTION: Strongly induced in macrophage cell line RAW264. 7 
CC during cholesterol influx. Induction is mediated by the liver X 

CC receptor/retinoide X receptor (LXR/RXR) pathway. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 
CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 
CC the European Bioinf ormatics Institute. There are no restrictions on its 
CC use by non-profit institutions as long as its content is in no way 
CC modified and this statement is not removed. Usage by and for commercial 
CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; U34920; AAB47738.1; 

DR EMBL; Z48745; CAA88636.1; -. 

DR EMBL; AF323659; AAK27442.1; -. 

DR MGD; MGI: 107704; Abcgl. 

DR InterPro; IPR003593; AAA_ATPase . 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR005284; Pigment_permease . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR TIGRFAMs; TIGR00955; 3a01204; 1. 

DR PROSITE; PS00211; ABC_TRANSPORTER_l ; 1. 

DR PROSITE; PS50893; ABC_TRANSP0RTER_2 ; 1. 

KW Transport; Lipid transport; ATP-binding; Transmembrane. 
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Query Match 18.4%; Score 644; DB 1; Length 666; 

Best Local Similarity 26.1%; Pred. No. 3.8e-40; 

Matches 177; Conservative 142; Mismatches 270; Indels 88; Gaps 18; 

Qy 8 ETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQSNTL EVRDLTYQVDIA 59 

Mill: : I I : I I : : : : I I : I I : I I 
Db 45 ETDLLNGHL KKVDNN — FTEAQRFSSLPRRAAVNIEFKDLSYSV 86 

Qy 60 SQVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVIT 119 

: I I I : : : : : I I I I : : : I I : I I I I : : : I : : : : 

Db 87 PEGPW WKKKGYKTL LKGISGKFNSGELVAIMGPSGAGKSTLMNILA 132 

Qy 120 GRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTF 179 

I 1111:1111 : II : : I I I I I : I I I : I : I : : I 

Db 133 GYRETG-MK-GAVLINGMPRDLRCFRKVSCYIMQDDMLLPHLTVQEAMMVSAHLKLQE — 188 

Qy 180 SQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEP 239 

1:1:::: II I I I I I I : : | | | : | : | : : | : : | : | | : : I I I 

Db 189 KDEGRREMVKEILTALGLLPCANTRTGS-- LSGGQRKRLAIALELVNNPPVMFFDEP 243 

Qy 240 TSGLDSFTAHNLWTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQ 299 

MINI: : I : : I I : I I : : : : I I I : : I I I I : : : : I : I I 
Db 244 TSGLDSASCFQWSLMKGLAQGGRSIVCTIHQPSAKLFELFDQLYVLSQGQCVYRGKVSN 303 

Qy 300 MVQYFTSIGHPCPRYSNPADFWDLTSIDRRSKEREVATVEKAQSIAALFLEKVQGFDD- 358 

: I I : I I I I I I I I I : : : I : : : : I : : I I 

Db 304 LVPYLRDLGLNCPTYHNPADFVMEVASGEYGDQNSRLVRAVREGMCDADYKRDLGGDTDV 363 

Qy 359 — FLWKAEAKELNTS THTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDL 413 

III I : I : I I : I : : I I I : I : I I 

Db 364 NPFLWHRPAEEDSASMEGCHSFSAS CLTQFCILFKRTFLSIMRDS 408 

Qy 414 PTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHSE 473 

: : : | : | | | | | | : : : I I : I : : I I 

Db 409 VLTHLRITSHIGIGLLIGLLYLGIGNEAKKVLSNSGFLFFSMLFLMFAALMPTVLTFPLE 468 

Qy 474 RSMLYYELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLL 533 

I : I : I : I : I I : : : I : : I : I I : I : I : I I 

Db 4 69 MSVFLREHLNYWYSLKAYYLAKTMADVPFQIMFPVAYCSIVYWMTSQPSDAVl^FVXFAAL 528 

Qy 534 WLWFCCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLS 593 

: : : : I I : : : : I I : I I : : I : I : I : I 

Db 529 GTMTS LVAQS LGLLI GAASTS LQVATFVGPVTAI PVLLFS GFFVS FDT I PAYLQWMS YI S 588 

Qy 594 FLRWCFSGLMQIQFNG HLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVI 647 

::|: I I:: : I I : :| I : : : : I : : I I : I I : 
Db 589 YVRYGFEGVI-LSIYGLDREDLHCDIAETCHFQKS EAI LRELDVENAKL Y- LDFI VL 643 

Qy 64 8 GISYGFLFLYYLSLKLI 664 

II ::: I I: I I 

Db 644 GI FFISLRLI 653 

RESULT 10 
ABG1_HUMAN 

ID ABG1 HUMAN STANDARD; PRT; 67 8 AA. 



AC P45844; Q9BXK6; Q9BXK7; Q9BXK8; Q9BXK9; Q9BXL0; Q9BXL1; Q9BXL2; 

AC Q9BXL3; Q9BXL4; 

DT 01-NOV-1995 (Rel. 32, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 15-MAR-2004 (Rel. 43, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 1 (White protein homolog) 

DE (ATP-binding cassette transporter 8) . 

GN ABCG1 OR ABC8 OR WHT1. 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID=9606; 

RN [1] 

RP SEQUENCE OF 3-678 FROM N.A. (ISOFORMS 1 AND 4). 

RC TISSUE=Retina; 

RX MEDLINE=96256850; PubMed-8659545 ; 

RA Chen H.M., Rossier C, Lalioti M.D., Lynn A., Chakravarti A., 

RA Perrin G., Antonarakis S.E.; 

RT "Cloning of the cDNA for a human homologue of the Drosophila white 

RT gene and mapping to chromosome 21q22.3. M ; 

RL Am. J. Hum. Genet. 59:66-75(1996). 

RN [2] 

RP SEQUENCE FROM N.A. (ISOFORM 1) . 

RX MEDLINE=20289799; PubMed=l 083 0953 ; 

RA Hattori M. , Fujiyama A., Taylor T.D., Watanabe H., Yada T., 

RA Park H.-S., Toyoda A., Ishii K., Totoki Y., Choi D.-K., Groner Y., 

RA Soeda E., Ohki M. , Takagi T . , Sakaki Y., Taudien S., Blechschmidt K., 

RA Polley A., Menzel U., Delabar J., Kumpf K. , Lehmann R., Patterson D., 

RA Reichwald K., Rump A., Schillhabel M. , Schudy A., Zimmermann W., 

RA Rosenthal A., Kudoh J., Shibuya K., Kawasaki K. , Asakawa S., 

RA Shintani A., Sasaki T., Nagamine K. , Mitsuyama S., Antonarakis S.E., 

RA Minoshima S., Shimizu N. , Nordsiek G., Hornischer K., Brandt P., 

RA Scharfe M. , Schoen 0., Desario A., Reichelt J., Kauer G., Bloecker H., 

RA Ramser J., Beck A., Klages S., Hennig S., Riesselmann L., Dagand E. , 

RA Wehrmeyer S., Borzym K., Gardiner K., Nizetic D., Francis F. , 

RA Lehrach H., Reinhardt R. , Yaspo M.-L.; 

RT "The DNA sequence of human chromosome 21."; 

RL Nature 405:311-319(2000) . 

RN [3] 

RP SEQUENCE FROM N.A. (ISOFORM 1) . 

RX MEDLINE=20408883; PubMed=10950923 ; 

RA Berry A., Scott H.S., Kudoh J., Talior I., Korostishevsky M. , 

RA Wattenhofer M. , Guipponi M., Barras C, Rossier C, Shibuya K. , 

RA Wang J., Kawasaki K., Asakawa S., Minoshima S., Shimizu N . , 

RA Antonarakis S.E., Bonne-Tamir B.; 

RT "Refined localization of autosomal recessive nonsyndromic deafness 

RT DFNB10 locus using 34 novel microsatellite markers, genomic 

RT structure, and exclusion of six known genes in the region."; 

RL Genomics 68:22-29(2000). 

RN [4] 

RP SEQUENCE FROM N.A. (ISOFORM 1) . 

RX MEDLINE=21192304; PubMed=l 127 9031; 

RA Porsch-Oezcueruemez M. , Langmann T., Heimerl S., Borsukova H., 

RA Kaminski W.E., Drobnik W. , Honer C, Schumacher C, Schmitz G. ; 

RT "The zinc finger protein 202 (ZNF202) is a transcriptional repressor 

RT of ATP binding cassette transporter Al (ABCA1) and ABCG1 gene 

RT expression and a modulator of cellular lipid efflux."; 



RL J. Biol. Chem. 276:12427-12433(2001). 

RN [5] 

RP SEQUENCE FROM N.A. (ISOFORMS 2; 3; 4; 5; 6 AND 7). 

RX MEDLINE=21092576; PubMed=11162488 ; 

RA Lorkowski S., Rust S., Engel T., Jung E., Tegelkamp K. , Galinski E.A., 

RA Assmann G . , Cullen P.; 

RT "Genomic sequence and structure of the human ABCG1 (ABC8) gene."; 

RL Biochem. Biophys . Res. Commun. 280:121-131(2001). 

RN [6] 

RP SEQUENCE OF 33-678 FROM N.A. 

RC TISSUE=Fetal brain; 

RX MEDLINE=97186700; PubMed=9034316 ; 

RA Croop J.M., Tiller G.E., Fletcher J. A., Lux M.L., Raab E. , 

RA Goldenson D., Arciniegas S., Son D . , Wu R. ; 

RT "Isolation and characterization of a mammalian homolog of the 

RT Drosophila white gene."; 

RL Gene 185:77-85 (1997) . 

RN [7] 

RP INDUCTION, AND PROBABLE FUNCTION. 

RX MEDLINE=20261604; PubMed=10799558 ; 

RA Venkateswaran A. , Repa J. J., Lobaccaro J.-M.A., Bronson A., 

RA Mangelsdorf D.J., Edwards P. A.; 

RT "Human white/murine ABC8 mRNA levels are highly induced in 

RT lipid-loaded macrophages. A transcriptional role for specific 

RT oxysterols . " ; 

RL J. Biol. Chem. 275:14700-14707(2000). 

RN [8] 

RP INDUCTION, AND PROBABLE FUNCTION. 

RX MEDLINE=20105556; PubMed=10639163 ; 

RA Klucken J., Buechler C, Orso E., Kaminski W.E., 

RA Porsch-Oezcueruemez M. , Liebisch G., Kapinsky M. , Diederich W., 

RA Drobnik W., Dean M. , Allikmets R. , Schmitz G. ; 

RT "ABCG1 (ABC8) f the human homolog of the Drosophila white gene, is a 

RT regulator of macrophage cholesterol and phospholipid transport."; 

RL Proc. Natl. Acad. Sci. U.S.A. 97:817-822(2000). 

RN [9] 

RP REVIEW. 

RX MEDLINE=21474438; PubMed=11590207 ; 

RA Schmitz G., Langmann T., Heimerl S.; 

RT "Role of ABCG1 and other ABCG family members in lipid metabolism."; 

RL J. Lipid Res. 42:1513-152 0(2001). 

CC -!- FUNCTION: Transporter involved in macrophage lipid homeostasis. Is 
CC an active component of the macrophage lipid export complex. Could 

CC also be involved in intracellular lipid transport processes. The 

CC role in cellular lipid homeostasis may not be limited to 

CC macrophages . 

CC -!- SUBUNIT: May form heterodimers with several heterologous partners 
CC of the ABCG subfamily. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein. Predominantly 
CC localized in the intracellular compartments mainly associated with 

CC the endoplasmic reticulum (ER) and Golgi membranes. 

CC -!- ALTERNATIVE PRODUCTS: 

CC Event=Alternative splicing; Named isoforms=7; 

CC Comment=Additional isoforms seem to exist; 

CC Name=l; 

CC IsoId=P45844-l ; Sequence=Displayed; 

CC Name =2 ; Synonyms=J; 



CC IsoId=P45844-2; Sequence=VSP__000047 , VSP_000051; 

CC Name =3 ; Synonyms =ABDE; 

CC IsoId=P45844-3; Sequence=VSP_00004 8 , VSP_000051; 

CC Name=4; Synonyms=G; 

CC IsoId=P45844-4; Sequence=VSP_000051 ; 

CC Name ^5 ; Synonyms^F; 

CC IsoId=P45844-5; Sequence=VSP_000049, VSP_000051; 

CC Name=6; Synonyms =H I ; 

CC IsoId=P45844-6; Sequence=VSP_000046, VSP_000051; 

CC Name=7; Synonyms=C; 

CC IsoId=P45844-7; Sequence=VSP_000050, VSPJD00051; 

CC -!- TISSUE SPECIFICITY: EXPRESSED IN SEVERAL TISSUES. 

CC -!- INDUCTION: Strongly induced in monocyte-derived macrophages during 
CC cholesterol influx. Conversely, mRNA and protein expression are 

CC suppressed by lipid efflux. Induction is mediated by the liver X 

CC receptor/retinoide X receptor (LXR/RXR) pathway. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; X91249; CAA62 631.1; ALT_INIT. 

DR EMBL; AP001746; BAA95530.1; ALT_INIT. 

DR EMBL; AB038161; BAB13728.2; ALT_INIT. 

DR EMBL; AJ289137; CAC00730.1; ALT_INIT. 

DR EMBL; AJ289138; CAC00730.1; JOINED. 

DR EMBL; AJ289139; CAC00730.1; JOINED. 

DR EMBL; AJ289140; CAC00730.1; JOINED. 

DR EMBL; AJ289141; CAC00730.1; JOINED. 

DR EMBL; AJ289142; CAC00730.1; JOINED. 

DR EMBL; AJ289143; CAC00730.1; JOINED. 

DR EMBL; AJ289144; CAC00730.1; JOINED. 

DR EMBL; AJ289145; CAC00730.1; JOINED. 

DR EMBL; AJ289146; CAC00730.1; JOINED. 

DR EMBL; AJ289147; CAC00730.1; JOINED. 

DR EMBL; AJ289148; CAC00730.1; JOINED. 

DR EMBL; AJ289149; CAC00730.1; JOINED. 

DR EMBL; AJ289150; CAC00730.1; JOINED. 

DR EMBL; AJ289151; CAC00730.1; JOINED. 

DR EMBL; AF323658; AAK28836.1; -. 

DR EMBL; AF323644; AAK28836.1; JOINED. 

DR EMBL; AF323645; AAK28836.1; JOINED. 

DR EMBL; AF323646; AAK28836.1; JOINED. 

DR EMBL; AF323647; AAK28836.1; JOINED. 

DR EMBL; AF323648; AAK28836.1; JOINED. 

DR EMBL; AF323649; AAK28836.1; JOINED. 

DR EMBL; AF323650; AAK28836.1; JOINED. 

DR EMBL; AF323651; AAK28836.1; JOINED. 

DR EMBL; AF323652; AAK28836.1; JOINED. 

DR EMBL; AF323653; AAK28836.1; JOINED. 

DR EMBL; AF323654; AAK28836.1; JOINED. 
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Query Match 18.3%; Score 638; DB 1; Length 678; 

Best Local Similarity 26.5%; Pred. No. l.le-39; 

Matches 184; Conservative 138; Mismatches 262; Indels 110; Gaps 20; 

Qy 8 ETQLWNGTVLQDASGLQDS-LFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIASQVPWFE 66 

| I I I I : : : I :: I I I : : : I I I I : I I : II 

Db 45 ETDLLNGHLKKVDNNLTEAQRFSSLPRRA AVNI EFRDLS YSV PEGPW— 91 

Qy 67 QLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGK 126 

II : : : : I I | | : : : | | : | II I : : : I : : : : I I 

Db 92 WRKKGYKTL LKGISGKFNSGELVAIMGPSGAGKSTLMNILAGYRETG- 138 

Qy 127 MKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDK 186 

II I : I I I I : II : : | | | M : I I I : I : I : : I I : 

Db 139 MK-GAVLINGLPRDLRCFRKVSCYIMQDDMLLPHLTVQEAMMVSAHLKLQE — KDEGRRE 195 

Qy 187 RVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSF 24 6 



Db 



196 





250 



Qy 



247 



TAHNLVTTLSRLAKGNRLVi^ISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTS 



306 



Db 



251 




310 



307 



IGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDD- 



358 



Db 



311 




EVAS GE YGDQN S RLVRAVREGMCD S DHKRDLG 



358 



Qy 359 FLWKAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFR 411 

Ml ::|: : I : I : || | :| : I 

Db 359 GDAEWPFLWHRPSEEVKQTKRLKGLRKDSSSMEGCHSFSASCLTQFCILFKRTFLSIMR 418 

Qy 412 DLPTLLIHGSEACLMSLIIGFLYYGHG — AKQL S FMDT AALL FMI GAL I P FNVI LD 465 

I : : : I : I I I I I I I I :: |: : I I Ihl 
Db 419 DSVLTHLRITSHIGIGLLIGLLYLGIGNEAKKVLSNSGFLFFSMLFLMFAALMP 472 

Qy 466 WSKCHSERSMLYYELEDGL YTAGPYFFAKILGELPEHCAYVIIYAMPIYW 516 

::| : II I: |: |: II : ::| : : I Ml 

Db 473 TVLTFPLEMGVFLREHLNYWYSLKAYYLAKTMADVPFQIMFPVAYCSIVYW 523 

Qy 517 LTNLRPVPELFLLHFLLVWLWFCCRTMALAASAMLPTFHMS S FFCNALYNS FYLTAGFM 576 

: I : | : | I : : : : I I : : : : I I : I I 

Db 524 MT S Q P S DAVR FVL FAAL GTMT S L VAQ S L G L L I GAAS T S LQ VAT FVG P VT AI P VL L F S G FF 583 

Qy 577 INLDNLWIVPAWISKLSFLRWCFSGLMQIQFNG HLYTTQIGNFTFSILGDTMIS 630 

:: I : I : I : I : : I : I I : : : I I : : I I : :: 
Db 584 VS FDTI PTYLQWMS YI S YVRYGFEGVI -LS I YGLDREDLHCDI DETCHFQKS EAILR 639 

Qy 631 AMDLNSHPLYAIYLIVIGISYGFLFLYYLSLKLI 664 

: I : : I I : I I : I I ::: I I : I I 

Db 640 ELDVENAKLY-LDFIVLGI FFISLRLI 665 



RESULT 11 
WHIT_DROME 

ID WHIT_DROME STANDARD; PRT; 687 AA. 

AC P10090; Q9V3A2; Q9XY33; 

DT 01-MAR-1989 (Rel. 10, Created) 

DT 01-NOV-1991 (Rel. 20, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE White protein. 

GN W OR EG:BACN33B1.1 OR CG2759. 

OS Drosophila melanogaster (Fruit fly) . 

OC Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; 

OC Neoptera; Endopterygota; Diptera; Brachycera; Muscornorpha; 

OC Ephydroidea; Drosophilidae; Drosophila. 

OX NCBI_TaxID=7227; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Head; 

RX MEDLINE=90221897; PubMed=2109311 ; 

RA Pepling M. , Mount S.M.; 

RT "Sequence of a cDNA from the Drosophila melanogaster white gene."; 

RL Nucleic Acids Res. 18:1633-1633(1990). 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=85134865; PubMed=6084717 ; 

RA 0 ! Hare K., Murphy C, Levis R. , Rubin G.M.; 

RT "DNA sequence of the white locus of Drosophila melanogaster."; 

RL J. Mol. Biol. 180:437-455(1984). 

RN [3] 

RP SEQUENCE FROM N.A. 

RX MEDLINE-21100348; PubMed=11156992 ; 

RA Lukacsovich T . , Asztalos Z., Awano W., Baba K., Kondo S., Niwa S., 

RA Yamamoto D. ; 



RT "Dual-tagging gene trap of novel genes in Drosophila melanogaster . " ; 

RL Genetics 157:727-742(2001). 

RN [4] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Berkeley; 

RX MEDLINE=2 019 6006; PubMed=10731132 ; 

RA Adams M.D., Celniker S.E., Holt R.A. , Evans C.A. , Gocayne J.D., 

RA Amanatides P.G., Scherer S.E., Li P.W., Hoskins R.A., Galle R.F., 

RA George R.A., Lewis S.E., Richards S., Ashburner M. , Henderson S.N., 

RA Sutton G.G., Wortman J.R., Yandell M.D., Zhang Q. , Chen L.X., 

RA Brandon R.C., Rogers Y.-H.C, Blazej R.G., Champe M., Pfeiffer B.D., 

RA Wan K.H., Doyle C, Baxter E.G., Helt G., Nelson C.R. f Miklos G.L.G., 

RA Abril J.F., Agbayani A., An H.-J., Andrews-Pf annkoch C, Baldwin D., 

RA Ballew R.M., Basu A., Baxendale J. , Bayraktaroglu L., Beasley E.M., 

RA Beeson K.Y., Benos P.V., Berman B.P., Bhandari D., Bolshakov S., 

RA Borkova D., Botchan M.R., Bouck J., Brokstein P., Brottier P., 

RA Burtis K.C., Busam D.A., Butler H., Cadieu E., Center A., Chandra I., 

RA Cherry J.M., Cawley S., Dahlke C, Davenport L.B., Davies P., 

RA de Pablos B. , Delcher A., Deng Z., Mays A.D., Dew I., Dietz S.M., 

RA Dodson K., Doup L.E., Downes M. , Dugan-Rocha S., Dunkov B.C., Dunn P., 

RA Durbin K.J., Evangelista C.C., Ferraz C, Ferriera S., Fleischmann W., 

RA Fosler C, Gabrielian A.E., Garg N.S., Gelbart W.M., Glasser K. , 

RA Glodek A., Gong F. , Gorrell J.H., Gu Z., Guan P., Harris M. , 

RA Harris N.L., Harvey D.A., Heiman T.J., Hernandez J.R., Houck J., 

RA Hostin D., Houston K.A., Howland T.J., Wei M.-H., Ibegwam C. , 

RA Jalali M., Kalush F. , Karpen G.H., Ke Z., Kennison J. A. , Ketchum K.A. , 

RA Kimmel B.E., Kodira CD., Kraft C, Kravitz S., Kulp D., Lai Z., 

RA Lasko P., Lei Y. , Levitsky A. A. , Li J.H., Li Z., Liang Y., Lin X., 

RA Liu X., Mattei B., Mcintosh T.C., McLeod M.P., McPherson D . , 

RA Merkulov G., Milshina N.V., Mobarry C, Morris J., Moshrefi A., 

RA Mount S.M., Moy M. , Murphy B., Murphy L., Muzny D.M., Nelson D.L., 

RA Nelson D.R., Nelson K.A. , Nixon K. , Nusskern D.R., Pacleb J.M., 

RA Palazzolo M. , Pittman G.S., Pan S., Pollard J., Puri V., Reese M.G., 

RA Reiner t K. , Remington K. , Saunders R.D.C., Scheeler F. , Shen H., 

RA Shue B.C., Siden-Kiamos I., Simpson M. , Skupski M.P., Smith T . , 

RA Spier E., Spradling A.C., Stapleton M. , Strong R., Sun E., 

RA Svirskas R. , Tector C, Turner R., Venter E., Wang A.H., Wang X., 

RA Wang Z.-Y., Wassarman D.A., Weinstock G.M., Weissenbach J., 

RA Williams S.M., Woodage T., Worley K.C., Wu D., Yang S., Yao Q.A., 

RA Ye J., Yeh R.-F., Zaveri J.S., Zhan M. , Zhang G., Zhao Q. , Zheng L., 

RA Zheng X.H., Zhong F.N., Zhong W. , Zhou X., Zhu S., Zhu X., Smith H.O., 

RA Gibbs R.A. , Myers E.W., Rubin G.M., Venter J.C.; 

RT "The genome sequence of Drosophila melanogaster."; 

RL Science 287:2185-2195(2000). 

RN [5] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Oregon-R; 

RX MEDLINE=20196011; PubMed=10731137 ; 

RA Benos P.V., Gatt M.K., Ashburner M. , Murphy L., Harris D., 

RA Barrell B.G., Ferraz C, Vidal S., Brun C, Demailles J., Cadieu E. , 

RA Dreano S., Gloux S., Lelaure V., Mottier S., Galibert F. , Borkova D., 

RA Minana B., Kafatos F.C., Louis C, Siden-Kiamos I., Bolshakov S., 

RA Papagiannakis G. , Spanos L., Cox S., Madueno E., de Pablos B., 

RA Modolell J., Peter A., Schoettler P., Werner M., Mourkioti F., 

RA Beinert N., Dowe G., Schaefer U., Jaeckle H., Bucheton A., 

RA Callister D.M., Campbell L.A., Darlamitsou A., Henderson N.S., 

RA McMillan P.J., Salles C, Tait E.A., Valenti P., Saunders R.D.C., 



RA Glover D.M. ; 

RT "From sequence to chromosome: the tip of the X chromosome of D. 

RT melanogaster . " ; 

RL Science 287:2220-2222(2000). 

RN [6] 

RP SEQUENCE OF 224-331 FROM N.A. 

RX MEDLINE=89339145; PubMed=2503416; 

RA Tearle R.G., Belote J.M., McKeown M. , Baker B.S., Howells A.J.; 

RT "Cloning and characterization of the scarlet gene of Drosophila 

RT melanogaster."; 

RL Genetics 122:595-606(1989). 

CC -!- FUNCTION: Part of a membrane-spanning permease system necessary 
CC for the transport of pigment precursors into pigment cells 

CC responsible for eye color. White dimerize with brown for the 

CC transport of guanine and with scarlet for the transport of 

CC tryptophan. 

CC -!- SUBUNIT: Heterodimer of white with either brown or scarlet. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; X51749; CAA36038.1; -. 

DR EMBL; X02974; CAA26716.1; -. 

DR EMBL; AB028139; BAA78210.1; -. 

DR EMBL; AE003425; AAF45826.1; -. 

DR EMBL; AL133506; CAB65847.1; -. 

DR EMBL; X76202; CAA53795.1; -. 

DR PIR; S08635; FYFFW. 

DR FlyBase; FBgn0003996; w. 

DR GO; GO: 0004888; F: transmembrane receptor activity; NAS . 

DR GO; GO:0006727; P:ommochrome biosynthesis; IMP. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR005284; Pigment_per mease . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR TIGRFAMs; TIGR00955; 3a01204; 1. 

DR PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; 1. 

DR PROSITE; PS50893; ABC_TRANSPORTER_2 ; 1. 

KW Pigment; ATP-binding; Transmembrane; Transport. 

FT NP_BIND 130 137 ATP (BY SIMILARITY) . 

FT TRANSMEM 435 453 POTENTIAL. 

FT TRANSMEM 4 65 485 POTENTIAL. 

FT TRANSMEM 515 533 POTENTIAL. 

FT TRANSMEM 542 563 POTENTIAL. 

FT TRANSMEM 576 594 POTENTIAL. 

FT TRANSMEM 659 67 8 POTENTIAL. 

FT CONFLICT 25 29 GDSGA -> LI FEI P YHCRVTAD (IN REF. 2 AND 

FT 3) . 



FT CONFLICT 49 49 L -> R (IN REF. 4 AND 5). 

FT CONFLICT 335 371 VGAQ C P TN YN PAD F YVQVLAWP GRE I E S RD RI AK I C -> 

FT ITLHLNSYPAWVPSVLPTTIRRTFTYRCWPLCPDGRSSPVI 
FT GSPRYG (IN REF. 3) . 

SQ SEQUENCE 687 AA; 75672 MW; 24AFAD799DE0D396 CRC64; 

Query Match 18.0%; Score 630.5; DB 1; Length 687; 

Best Local Similarity 29.4%; Pred. No. 4e-39; 

Matches 179; Conservative 112; Mismatches 24 6; Indels 71; Gaps 13; 

Qy 88 IRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGG — KMKSGQIWINGQPSTPQLVR 145 

: : I : I : : I I : : I I I I I : : I I : : I I II : I I I I : : : 

Db 113 L KNVC GVAY P G E L LAVMG S S GAGKTT L LNALAFRS PQ G I QVS P S GMRL LNGQ P VDAKEMQ 172 

Qy 146 KCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRV 205 

I : I : I I : : I I I I I I I : I : I I : II I I : I I I I I : I : I : 
Db 173 ARCAYVQQDDLFIGSLTAREHLIFQAMVRMPRHLTYRQRVARVDQVIQELSLSKCQHTII 232 

Qy 206 G-NT YVRGVSGGERRRVS I GVQLLWNPGI LI LDEPTSGLDS FTAHNL VTTLSRLAKGNRL 264 

I | : | : | | | | | : | : : : I : I : I I II I I I I I I I I I I I :: I I : I : : : 

Db 233 GVPGRVKGLSGGERKRLAFASEALTDPPLLICDEPTSGLDSFTAHSWQVLKKLSQKGKT 292 

Qy 265 VLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYV— 322 

|::::IM |::| III :||| I :|| : I :|: :| || 1 I I I I I I 
Db 2 93 VI LTIHQPSSELFELFDKILLMAEGRVAFLGTPSEAVDFFSYVGAQCPTNYNPADFYVQV 352 

Qy 323 DLTSIDRRSKEREVATVEKA QSLAALFLEK — VQGFDDFLWKAEAKEL 368 

: : I I I : I : : I III II I I : : : I I 
Db 353 LAWPGREIESRDRIAKICDNFAISKVARDMEQLLATKNLEKPLEQPENGYTYKAT 408 

Qy 369 NTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSL 428 

I I :: I : :: : : : :::: 
Db 409 WFMQFRAVLWRSWLSVLKEPLLVKVRLIQTTMVAI 443 

Qy 429 1 1 GFLYYGHGAKQLS FMDTAALLFMI GALI P FNVI LDWSKCHSERSMLYYELEDGLYTA 488 

: I I : : I I : I : : I : : I : : : II: I II 

Db 444 L I GL I FLGQQLTQVGVMN I NGAI FL FLTNMT FQNVFAT I NVFT S EL P VFMREARS RL YRC 503 

Qy 489 GPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFL LVWLWFCCRTM 544 

I I I : I I I : : : I : I I : I I I I I I I : 

Db 504 DTYFLGKTIAELPLFLTVPLVFTAIAYPMIGLR AGVLH FFNC LALVT LVANVS T S F 559 

Qy 545 ALAASAMLPTFHMSS FFCNALYNS FYLTAGFMINLDNLWI VPAWI SKLS FLRWCFSGLMQ 604 

I : I : : II I I : I : : : I : I I I : I : I I : 

Db 560 GYLISCASSSTSMALSVGPPVIIPFLLFGGFFLNSGSVPVYLKWLSYLSWFRYANEGLLI 619 

Qy 605 IQF NGHLYTTQI GN FT FS I LGDTMI SAMDLNSHPLYAIYLIVIGISYGFLF 655 

I : I : I II I : : I I I I I I : I : : : I I 
Db 620 NQWADVEPGEISCTS-SNTTCPSSGKVILETLNFSAADL PLDYVGLAILIVS — FRV 673 

Qy 656 LYYLSLKL 663 

I I I : I : I 
Db 674 LAYLALRL 681 



RESULT 12 
WHIT CERCA 



ID WHIT_CERCA STANDARD; PRT; 679 AA. 

AC Q17320; 

DT Ol-NOV-1997 (Rel. 35, Created) 

DT Ol-NOV-1997 (Rel. 35, Last sequence update) 

DT 16-OCT-2001 (Rel. 40, Last annotation update) 

DE White protein. 

GN W. 

OS Ceratitis capitata (Mediterranean fruit fly) . 

OC Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; 

OC Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha ; 

OC Tephritoidea; Tephritidae; Ceratitis. 

OX NCBI_TaxID=7213; 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=96123276; PubMed=8533095 ; 

RA Zwiebel L.J., Saccone G. , Zacharopoulou A., Besansky N.J., 

RA Favia G., Collins F.H., Louis C, Kafatos F.C.; 

RT "The white gene of Ceratitis capitata: a phenotypic marker for 

RT germline transformation."; 

RL Science 270:2005-2007(1995). 

CC -!- FUNCTION: May be part of a membrane-spanning permease system 
CC necessary for the transport of pigment precursors into pigment 

CC cells responsible for eye color. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; X89933; CAA61998.1; -. 

DR InterPro; IPR003593; AAA_ATPase . 

DR InterPro; IPR003439; ABC_trans porter . 

DR InterPro; IPR005284; Pigment_permease . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR TIGRFAMs; TIGR00955; 3a01204; 1. 

DR PROSITE; PS00211; ABC__TRANSP0RTER_1 ; 1. 

DR PROSITE; PS50893; ABC_TRANSPORTER_2 ; 1. 

KW Pigment; ATP-binding; Transmembrane; Transport. 



FT 


NP_BIND 


121 


128 


ATP (BY SIMILARITY) 




FT 


TRANSMEM 


427 


445 


POTENTIAL. 




FT 


TRANSMEM 


457 


477 


POTENTIAL. 




FT 


TRANSMEM 


507 


525 


POTENTIAL. 




FT 


TRANSMEM 


534 


555 


POTENTIAL. 




FT 


TRANSMEM 


568 


586 


POTENTIAL. 




FT 


TRANSMEM 


651 


670 


POTENTIAL. 




FT 


CARBOHYD 


628 


628 


N-LINKED (GLCNAC. . 


.) (POTENTIAL) 


FT 


CARBOHYD 


643 


643 


N-LINKED (GLCNAC. . 


.) (POTENTIAL) 


SQ 


SEQUENCE 


679 AA; 


75145 


MW; 3F9CBC78A835C4CC 


CRC64; 



Query Match 



17.7%; Score 617.5; DB 1; Length 679; 



Best Local Similarity 27.0%; Pred. No. 3.6e-38; 

Matches 183; Conservative 126; Mismatches 271; Indels 97; Gaps 16; 

Qy 37 YFTYSGQSNTLEVRDLT YQVDIASQV PWFEQLAQFK IPW-RSHS 79 

I I I I I : I I I : I : I I : : : I I I I I 

Db 44 YGTLSPPSPALTADNLTYSWYNLDVFGAVHQPGSSWKQLVNRVKGVFCNERHIPAPRKHL 103 

Qy 80 SQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQI— WINGQ 137 

: : I I I : : I I : : I I I I I : : M : I I j : I 

Db 104 L KN D S GVAY P G EL LAVMG S S GAGKT T LLN AS AFRS S KGVQ I S P S T I RMLN GH 155 

Qy 138 PSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRL 197 

I : : : I : I : I I : : I I I I I I I : I : I I : I I : : I I : I I : I I 
Db 156 PVDAKEMQARCAYVQQDDLFIGSLTAREHLIFQAMVRMPRHMTQKQKVQRVDQVIQDLSL 215 

Qy 198 RQCANTRVG-NTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLS 256 

: I I I : I I : I : I I I I I : I : : : I : I : I I I I I I I I I I I I I I : : I I 

Db 216 GKCQNTLIGVPGRVKGLSGGERKRLAFASEALTDPPLLICDEPTSGLDSFMAHSWQVLK 275 

Qy 257 RLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSN 316 

: I : : : I : : : : I II I : : I I I I : I I I I : I I : I : I : I I II 
Db 276 KLSQKGKTVI LTIHQPSSELFELFDKILLMAEGRVAFLGTPGEAVDFFSYIGATCPTNYT 335 

Qy 317 PADFYVDLTSIDRRSKEREVATVEKAQSLAALF LEKVQGFDDFLW 361 

I I I I I I : : : III::: : I I I I I 
Db 336 PAD F YVQ VLAV VPGREVESRDRVAKICDNFAVGKVSREMEQNFQKLVKSNGFG 38 8 

Qy 362 KAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGS 421 

:: I I: I : I I :: I : :: : : 

Db 38 9 KEDENEYTYKASWFM QFRAVLWRSWLSVLKEPLLVKVRLL 42 8 

Qy 422 EACLMS LI I GFL YYGHGAKQLS FMDTAALLFMI GALI P FNVI LDWS KCHS ERSMLYYEL 481 

:::::: I I :: I I : I : : I : : I : : : I : I 

Db 429 QTTMVAVLIGLIFLGQQLTQVGVMNINGAI FLFLTNMTFQNSFATITVFTTELPVFMRET 488 

Qy 482 EDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLV 537 

II 111:111 : : I I I I I : I I I I I 

Db 489 RSRLYRCDTYFLGKTIAELPLFLWPFLFTAIAYPLIGLRPGVDHFFTALALVTLVANVS 54 8 

Qy 538 VFC- CRTMALAASAMLPT FHMS S FFCNALYNS FYLTAGFMINLDNLWI VPAWI S 590 

: I I : : : I I I : I I I I : I : : : I : I 

Db 549 TSFGYLISCACSSTSMALSVGPP VI I PFLLFGGFFLNSGSVPVYFKWLS 597 

Qy 591 KLSFLRWCFSGLMQIQF NGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIV 646 

I I: I: II: |: I : I : I I |: :::::: I :: : 

Db 598 YLSWFRYANEGLLINQWADVKPGEI-TCTLSNTTCPSSGEVILETLNFSASDLPFDFIGL 656 

Qy 647 I GI S YGFLFLYYLSLKL 663 

: II I : : I : 

Db 657 ALLI VGFRI SAYIALTM 673 



RESULT 13 
WHIT_LUCCU 

ID WHIT_LUCCU STANDARD; PRT; 677 AA. 

AC Q05360; 

DT 01-FEB-1995 (Rel. 31, Created) 



DT 01-NOV-1997 (Rel. 35, Last sequence update) 

DT 16-OCT-2001 (Rel. 40, Last annotation update) 

DE White protein. 

GN W. 

OS Lucilia cuprina (Greenbottle fly) (Australian sheep blowfly) . 

OC Eukaryota; Metazoa; Arthropo.da; Hexapoda; Insecta; Pterygota; 

OC Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha; Oestroidea; 

OC Calliphoridae; Lucilia. 

OX NCBI_TaxID=7375; 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=97087158; PubMed=8933176; 

RA Garcia R.L., Perkins H.D., Howells A.J.; 

RT "The structure, sequence and developmental pattern of expression of 

RT the white gene in the blowfly Lucilia cuprina."; 

RL Insect Mol. Biol. 5:251-260(1996). 

RN [2] 

RP SEQUENCE OF 490-584 FROM N.A. 

RX MEDLINE=90264941; PubMed=1971656; 

RA Elizur A., Vacek A.T., Howells A. J.; 

RT "Cloning and characterization of the white and topaz eye color genes 

RT from the sheep blowfly Lucilia cuprina."; 

RL J. Mol. Evol. 30:347-358(1990). 

CC -!- FUNCTION: May be part of a membrane-spanning permease system 
CC necessary for the transport of pigment precursors into pigment 

CC cells responsible for eye color. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; U38899; AAA82057.1; -. 

DR EMBL; X53265; CAA37365.1; -. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR005284; Pigment_permease . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR TIGRFAMs; TIGR00955; 3a01204; 1. 

DR PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER__2 ; 1. 

KW Pigment; ATP-binding; Transmembrane; Transport. 

FT NP_BIND 119 126 ATP (POTENTIAL) . 

FT TRANSMEM 431 451 POTENTIAL. 

FT TRANSMEM 456 476 POTENTIAL. 

FT TRANSMEM 506 526 POTENTIAL. 

FT TRANSMEM 534 554 POTENTIAL. 

FT TRANSMEM 563 583 POTENTIAL. 

FT TRANSMEM 647 667 POTENTIAL. 

SQ SEQUENCE 677 AA; 75365 MW; D16FC11C97EED51D CRC64; 



Query Match 17.4%; Score 606.5; DB 1; Length 677; 

Best Local Similarity 27.8%; Pred. No. 2.4e-37; 

Matches 190; Conservative 115; Mismatches 277; Indels 101; Gaps 17; 

Qy 14 GTVLQDASGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTY QVDIASQV PWF 65 

I I : : I I : hill II III : I : : I I 

Db 2 9 GTL — EASAINSGF — SKSYGSLV SNESASEKLTYSWCNLDVFGEVHQPGSNWK 7 8 

Qy 66 EQLAQFK 1 PW-RSHS SQDSCELGI RNLS FKVRSGQMLAI I GS SGCGRASLLD 116 

: : : I II I I |:|: |::||::||M |: :||: 

Db 79 QLVNRVKGVFCNERHI PKPRKHL 1 KNVC GVAY P GE L LAVMG S S GAGKT T L LN 130 

Qy 117 VITGRGHGGKM — KSGQIWINGQPSTPQLVRKCZVAHVRQHDQLLPNLTVRETLAFIAQMR 174 

: I I I : I I I : : : I : I : I I : : I I I I I I I : I 

Db 131 AIAFRSARGVQISPSSVT^LNGHPVDAKEMQARCAWQQDDLFIGSLTAREHLIFQATVR 190 

Qy 175 LPRTFSQAQRDKRVEDVIAELRLRQCANTRVG-NTYVRGVSGGERRRVSIGVQLLWNPGI 233 

: I I I : I I : : I I : I I : I I : I I I : I I : I : I I I I I : I : : : I : I : 

Db 191 MPRTMTQKQKLQRVDQVIQDLSLIKCQNTIIGVPGRVKGLSGGERKRLAFASEALTDPPL 250 

Qy 234 LILDEPTSGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIY 293 

I I I I I I I I I I I I I :: I I : I : : : I : : : : I I I I : : I I I I : I I I I : 

Db 251 LICDEPTSGLDSFMAASWQVLKKLSQRGKTVILTIHQPSSELFELFDKILLMAEGRVAF 310 

Qy 294 L GAAQQMVQ Y FT S I GH P C P R YS N PAD FYV D LT S I D RRS KE REVAT VE KAQ S 344 

II : I : I : I I I I I I I I I I I : : I II I I : II 

Db 311 L GT PVEAVD F F S F I GAQC PTN YN PAD FYVQVLAWPGRE IESRDRISKICDN FAVGKVS R 370 

Qy 345 LAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLTLTQDTDCGTAVELPGMI EQFSTLIRR 4 04 

III | : : : | | : I I :: I 

Db 371 EMEQNFQKIAAKTDGLQKDDET TILYKASWFTQFRAIMWR 410 

Qy 405 QISNDFRDLPTLLIHGSEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALIPFNVIL 464 

: :: : : :::::: | | :: I : I : : I : : I : 

Db 411 SWISTLKEPLLVKWLIQTTMVAVXIGLIFLNQPMTQVGVMNIN 470 

Qy 465 DWS KCHS ERSMLYYELEDGL YTAGP YFFAKI LGELPEHCAYVI I YAMP I YWLTNLRPVP 524 

I : : II: I II II Mill : : I : I I I 

Db 471 AVINVFTSELPVFMRETRSRLYRCDTYFLGKTLAELPLFLWPFLFIAIAYPMIGLRPGI 530 

Qy 525 ELFLLHFLLVWLV VFCCRTMALAAS AMLPT FHMS S FFCNAL YNS FYLTAG 574 

II II I I : I I I :: I I I I I 

Db 531 THFLSALALVTLVANVSTSFGYLISCASTSTSMALSVGP PLTIPFLLFGG 580 

Qy 575 FMINLDNLWIVPAWISKLSFLRWCFSGLMQIQFNGHLYTTQIGNFTFSILGDTMISA 631 

:| :: : |:| |: |: ||: |: III: I |: 

Db 581 VFLNSGSVPVYFKWLSYFSWFRYANEGLLINQW ADVQPGEITCTSTNTTCPSSGXV 636 

Qy 632 MDLNSHPLYAIYLIVI 647 

I : I I : I : : : 
Db 637 XLETLNFRDKFTFRLYGLILLIL 659 



RESULT 14 
WHIT_ANOGA 

ID WHIT ANOGA STANDARD; PRT; 695 AA. 



AC Q27256; Q17006; 

DT 01-NOV-1997 (Rel. 35, Created) 

DT 01-NOV-1997 (Rel. 35, Last sequence update) 

DT 16-OCT-2001 (Rel. 40, Last annotation update) 

DE White protein. 

GN W. 

OS Anopheles gambiae (African malaria mosquito) . 

OC Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; 

OC Neoptera; Endopterygota ; Diptera; Nematocera; Culicoidea; Anopheles. 

OX NCBI_TaxID=7165; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Suakoko / G3; 

RX MEDLINE-96423158; PubMed=8825759; 

RA Besansky N.J., Bedell J. A., Benedict M.Q., Mukabayire O., Hilfiker D., 

RA Collins F.H.; 

RT "Cloning and characterization of the white gene from Anopheles 

RT gambiae . " ; 

RL Insect Mol. Biol. 4:217-231(1995). 

CC -!- FUNCTION: May be part of a membrane- spanning permease system 
CC necessary for the transport of pigment precursors into pigment 

CC cells responsible for eye color. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; U29486; AAC46995.1; -. 

DR EMBL; U29485; AAC46994.1; -. 

DR EMBL; U29484; AAC47423.1; 

DR InterPro; IPR003593; AAA_ATPase. 

DR' InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR008965; Cellul_bind. 

DR InterPro; IPR005284; Pigment_permease . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR TIGRFAMs; TIGR00955; 3a01204; 1. 

DR PROSITE; PS00211; ABCJT RAN S PORT ER_1 ; 1. 

DR PROSITE; PS50893; ABCJT RAN S PORT ER_2 ; 1. 

KW Pigment; ATP-binding; Transmembrane; Transport. 



FT 


NP_BIND 


133 


140 


ATP (POTENTIAL) . 


FT 


NP_BIND 


288 


295 


ATP ( POTENTIAL) . 


FT 


TRANSMEM 


444 


464 


POTENTIAL. 


FT 


TRANS MEM 


474 


494 


POTENTIAL. 


FT 


TRANSMEM 


524 


544 


POTENTIAL. 


FT 


TRANSMEM 


552 


572 


POTENTIAL. 


FT 


TRANSMEM 


581 


601 


POTENTIAL. 


FT 


TRANSMEM 


669 


689 


POTENTIAL . 


FT 


CARBOHYD 


472 


472 


N-LINKED (GLCNAC. 


FT 


CARBOHYD 


645 


645 


N-LINKED (GLCNAC. 



) (POTENTIAL) 
) (POTENTIAL) 



FT 
FT 
SQ 



CONFLICT 
CONFLICT 
SEQUENCE 



100 100 N -> S (IN REF. 1; AAC4 7423) . 

691 693 SRS -> YAR {IN REF. 1; AAC47423) • 

695 AA; 77218 MW; EE8B9517239B2961 CRC64; 



Query Match 17.1%; Score 598.5; DB 1; Length 695; 

Best Local Similarity 28.5%; Pred. No. 9.7e-37; 

Matches 151; Conservative 113; Mismatches 220; Indels 45; Gaps 10; 

Qy 88 IRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGG-KMKSGQI-WINGQPSTPQLVR 145 

::|:: : I I : : I I : : I I I I I : : I I : : I I I : : :|| I : :l 

Db 116 LKNWGVAKS GELLAVMGS S GAGKTTLLNALAFRS P PGVKI S PNAVRALNGVPVNAEQLR 175 

Qy 14 6 KCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCANTRV 2 05 

I : I : I I : I : I I I I I I I : I : I : : I I : : I : I I I : I I : I : 

Db 17 6 ARCAYVQQDDLFI P S LTTREHLLFQAMLRMGRDVPASVKQHRVQEVXQELS LVKCADT 1 1 235 

Qy 206 GNT-YVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRL 264 

I : : | : | | | | | : | : : : I : I : I : I I I I I I I I I I I I : : : I : I : 

Db 236 GAPGRIKGLSGGERKRLAFASETLTDPHLLLCDEPTSGLDSFMAHSVLQVLKGMAMKGKT 295 

Qy 265 VTjISLHQPRSDIFRLFDLVXLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDL 324 

:::::||| |::: Ml :||: I MM I ::|: :| Ml MINI I : 
Db 296 IILTIHQPSSELYCLFDKILLVAEGRVAFLGSPYQSAEFFSQLGIPCPPNYNPADFYVQM 355 

Qy 325 TSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTSTHTVSLTLTQDTD 384 

: I : I I I : : M : II MM I h 

Db 356 LAI-APAKEAECRDM 1 KKI - - CDS FAVS P I AREV LETASV 392 

Qy 385 CGTAVELPGMIE QFSTLIRRQI SNDFRDLPTLLIHGSEACLMSL 428 

| : : | | : : I I : : I : : I : : : : : : 

Db 393 AGKGMDEPYMLQQVEGVGSTGYRSSWWTQFYCILWRSWLSVLKDPMLVKVRLLQTAMVAT 452 

Qy 429 IIGFLYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYYELEDGLYTA 488 

: I I : I : I I I : I I : : | : I : : : I : I II 

Db 453 LIGSIYFGQVLDQDGVMNINGSLFLFLTNMTFQNVFAVINVFSAELPVFLREKRSRLYRV 512 

Qy 489 GPYFFAKILGELPEHCAWIIYAMPIYWLTNLRPVPELFLLHFLLWLWFCCRTMALAA 548 

II I : III I :: I : II : I MM : 

Db 513 DTYFLGKTIAELPLFIAVPFVFTSITYPMIGLRTGATHYLTTLFIVTLVANVSTSFGYLI 572 

Qy 549 SAMLPT FHMS S FFCNAL YN S FYLTAGFMINLDNLWI VPAWI SKLS FLRW 597 

I : I : : I : I I M : MM MM I 
Db 573 SCASSSISMALSVGPPWIPFLIFGGFFLNSAS VPAYFKYLSYLSW 618 



RESULT 15 
YPC3_CAEEL 

ID YPC3_CAEEL STANDARD; PRT; 598 AA. 

AC Q11180; 

DT 01-NOV-1997 (Rel. 35, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 28-FEB-2003 (Rel. 41, Last annotation update) 

DE Putative ABC transporter C05D10.3 in chromosome III. 

GN C05D10.3. 

OS Caenorhabditis elegans . 

OC Eukaryota; Metazoa; Nematoda; Chromadorea; Rhabditida; Rhabditoidea; 
OC Rhabditidae; Peloderinae; Caenorhabditis. 



OX NCBI_TaxID=6239; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Bristol N2 ; 

RA Du Z . ; 

RL Submitted (AUG-1994) to the EMBL/ GenBank/DDBJ databases. 

RN [2] 

RP REVISIONS. 

RA Waterston R. ; 

RL Submitted (SEP-2001) to the EMBL/ GenBank/DDBJ databases. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Potential). 

CC -!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; U13645; AAA20989.2; -. 

DR WormPep; C05D10.3; CE29170. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR005284; Pigment_permease . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR TIGRFAMs; TIGR00955; 3a01204; 1. 

DR PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; FALSE_NEG. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

KW Hypothetical protein; ATP-binding; Transmembrane; Transport. 



FT 


NP_BIND 


27 


34 


ATP (POTENTIAL) . 


FT 


TRANSMEM 


336 


356 


POTENTIAL. 


FT 


TRANSMEM 


425 


445 


POTENTIAL. 


FT 


TRANSMEM 


453 


473 


POTENTIAL. 


FT 


TRANSMEM 


478 


498 


POTENTIAL. 


SQ 


SEQUENCE 


598 AA; 


66906 


MW; 9D6414E06898E343 



Query Match 17.1%; Score 596.5; DB 1; Length 598; 

Best Local Similarity 27.5%; Pred. No. l.le-36; 

Matches 166; Conservative 121; Mismatches 264; Indels 53; Gaps 13; 

Qy 88 IRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKMKSGQIWINGQPSTPQLVRKC 147 

: I : I I I : : I I I : I M I I : : I : : I : I I I I I : I : : : I : 

Db 10 LHNVS GMAES GKLLAI LGS S GAGKTTLMNVLT S RNLTNLDVQGS I LI DGRRANKWKI REM 69 

Qy 148 VAHVRQHDQLLPNLTVRETLAFIAQMRL-PRTFSQAQRDKRVEDVIAELRLRQCANTRVG 206 

11:111 : : I I I I I : I : : I : : : I : I I I I I : : : I : : I I : I : I 
Db 70 SAFVQQHDMFVGTMTAREHLQFMARLRMGDQYYSDHERQLRVEQVLTQMGLKKCADTVIG 129 

Qy 26l -NTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVTTLSRLAKGNRLV 265 

: : I : I I I : : I : I : : I III I I I I I I I I : I I : : I I II I 
Db 130 IPNQLKGLSCGEKKRLSFASEILTCPKILFCDEPTSGLDAFMAGHWQALRSLADNGMTV 189 



Qy 266 LISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPCPRYSNPADFYVDLT 325 



Db 


190 


IITIHQPSSHVYSLFNNVCLMACGRVIYLGPGDQAVPLFEKCGYPCPAYYNPADHLI 


246 


Qy 


326 


SIDRRSKEREVATVEKAQSLA7U.FLEKV-QGF DDFLWKAEAKELN TSTH 

I : I : : : : : : 1 : 1 II 1 1 : 1 1 : 

RTLAVIDSDRATSMKTISKIRQGFLSTDLGQSVLAIGNANKLRAASFVTGSD 


373 


Db 


247 


298 


Qy 


374 


TVSLTLT QDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMSLII 

| || ||: 1 1 1 1 II 1 : : : : 1 
TSEKTKTFFNQDYNA SFWTQFLALFWRSWLTVI RDPNLLSVRLLQI LITAFIT 


430 


Db 


299 


351 


Qy 


431 


GFLYYGHGAKQLS FMDTAALLFMI GALI P FNVI LDWS KCHSERSML YYELEDGLYTAGP 


490 


Db 


352 


| : : : : : : : 1 : 1 : 1 : | : : 1 : 1 : 1 
GIVFFQTPVTPATIISINGIMFNHIRNMNFMLQFPNVPVITAELPIVLRENANGVYRTSA 


411 


Qy 


491 


YFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLWFCCRTMALAASA 

| | | | : I 1 1 : : hi : I I : : 1 1 : : 1 1 : 1 : : 1 
YFLAKNIAELPQYIILPILYNTIVYWMSGLYP N FWN YC FAS L VT I L I TNVAI S I S Y 


550 


Db 


412 


467 


Ov 


551 


MLPTFHMS S FFCNAL YNS FYLT AGFMINLDNLWIVPAWISKLSFLRWCFSGLMQIQ 

: | : : : I : II 1 1 : | : | | | : : : : I : 
AVATIFANTDVAMTILPIFWPIMAFGGFFITFDAIPSYFKWLSSLSYFKYGYEALAINE 


606 


Db 


468 


527 


Qy 


607 


FNGHL YTTQI GNFT FS I L GDTMI SAMDLN- SHPLYAI YLIVI GI S YGFLFLY 

:: : 1 : : | : : : : | : I I : : 1 1 : 1 : 1 : 


657 


Db 


528 


WDSIKVIPECFNSSMTAFALDSCPKNGHQVLESIDFSASHKIFDI-SILFGMFIGIRIIA 


586 



Qy 658 YLSL 661 

|::| 

Db 587 YVAL 590 



Search completed: February 27, 2004, 07:12:37 
Job time : 12.4048 sees 



